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Description 

FIELD OF THE INVENTION 

[0001] The present invention relates to isolated polynucleotides that represent a complete gene, or a fragment there- 
of, that is expressed. In addition, the present invention relates to the polypeptide or protein corresponding to the coding 
sequence of these polynucleotides. The present invention also relates to isolated polynucleotides that represent reg- 
ulatory regions of genes. The present invention also relates to isolated polynucleotides that represent untranslated 
regions of genes. The present invention further relates to the use of these isolated polynucleotides and polypeptides 
and proteins. 

DESCRIPTION OF THE RELATED ART 

[0002] Efforts to map and sequence the genome of a number of organisms are in progress; a few complete genome 
sequences, for example those of £ co// and Saccharomyees cerevbiae are known (Blattner et al., Science 277: 1453 
(1997); Goffeau et al., Science 274:546 (1996)). The complete genome of a multicellular organism, C. elegans, has 
also been sequenced (See, the C. elegans Sequencing Consortium, Science 282:201 2 (1998)). To date, no complete 
genome of a plant has been sequenced, nor has a complete cDNA complement of any plant been sequenced. 

SUMMARY OF THE INVENTION 

[0003] The present invention comprises polynucleotides, such as complete cDNA sequences and/or sequences of 
genomic DN A encompassing complete genes, fragments of genes, and/or regulatory elements of genes and/or regions 
with other functions anchor intergenic regions, hereinafter collectively referred to as Sequence-Determined DNA Frag- 
ments (SDFs), from different plant species, particularly com, wheat soybean, rice and Arabidopsis thafiana, and other 
plants and or mutants, variants, fragments or fusions of said SDFs and polypeptides or proteins derived therefrom. In 
some instances, the SDFs span the entirety of a protein-coding segment. In some instances, the entirety of an mRNA 
is represented. Other objects of the invention that are also represented by SDFs of the invention are control sequences, 
such as, but not limited to, promoters. Complements of any sequence of the invention are also considered part of the 
invent ton. 

[0004] Other objects of the invention are polynucleotides comprising exon sequences, polynucleotides comprising 
intron sequences, polynucleotides comprising introns together with exons, intron/exon junction sequences, 5' untrans- 
lated sequences, and 3' untranslated sequences of the SDFs of the present invention. Polynucleotides representing 
the joinder of any exons described herein, in any arrangement, for example, to produce a sequence encoding any 
desirable amino acid sequence are within the scope of the invention. 

[0005] The present invention also resides in probes useful for isolating and identifying nucleic acids that hybridize 
to an SDF of the invention. The probes can be of any length, but more typically are 12-2000 nucleotides in length; 
more typically/ 15 to 200 nucleotides long; even more typically, 18 to 100 nucleotides long. 

[0006] Yet another object of the invention is a method of isolating and/or identifying nucleic acids using the following 
steps: 

(a) contacting a probe of the instant invention with a polynucleotide sample under conditions that permit hybridi- 
zation and formation of a polynucleotide duplex; and 

(b) detecting and/or isolating the duplex of step (a). 

[0007] The conditions for hybridization can be from low to moderate to high stringency conditions. The sample can 
include a polynucleotide having a sequence unique in a plant genome. Probes and methods of the invention are useful, 
for example, without limitation, for mapping of genetic traits and/or for positional cloning of a desired fragment of ge- 
nomic DNA. 

[0008] Probes and methods of the invention can also be used for detecting alternatively spliced messages within a 
species. Probes and methods of the invention can further be used to detect or isolate related genes in other plant 
species using genomic DNA (gDN A) and/or cDNA libraries. In some instances, especially when longer probes and low 
to moderate stringency hybridization conditions are used, the probe will hybridize to a plurality of cDNA and/or gDNA 
sequences of a plant. This approach is useful lor isolating representatives of gene families which are identifiable by 
possession of a common functional domain in the gene product or which have common cis-acting regulatory sequences. 
This approach is also useful for identifying orthotogous genes from other organisms. 

[0009] The present invention also resides in constructs for modulating the expression of the genes comprised of all 
or a fragment of an SDF The constructs comprise all or a fragment of the expressed SDF, or of a complementary 
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promoters, introns. untranslated regions TSaSZfr^f 9 T? " PartS ,here °'' COns,ructe 
gions, DNA and chromatin conformation' ^ZfTJ^I^TT^ 9 enhancin 9 or reduci "9 ^ 

plasmid. bac,eria.ar.mcialchromosomes (SsTp bs 3a rt Tc^ « be COnS,ruc,ed usi "9 

piant artifical chromosomes or other types of SJ^^JlI?^ * aUtOn0mous »*« 
as DNA integrated into the genome When inseSd J t^a h^LTl n th P * aUt0nomous sequences or 

with, or operate iinked to. a betels ^ mte ^ 
operably linked to a promoter that is fwS^^ST 9 fe9i0n ,r0m 30 SDF be 

[0010] The present invention also resides in host cells inrinriino ho rf •■ 

that harbor constructs such as described abcTtSt ? ^ Ce,lS ° f P ' am Ce,ls - ^ ^ 

expression of specific genes in pfcnts by exXl^ZcSl^^T * m-h °* for m0dU,a,in 9 

s^afoneornx™ endogenous W ^ 

in a plant. Methods of modulation of gene e^iS, iSrrh^ rt 7rr7 ,,heP0,ynUC,e0,ideS0,,he inve "«°" 
copies of a polynucleotide comprising a ^S!Z^^^S!^ 2 " Sertin9 Wd * *" addi,ional 
inserting antisense or ribozyme constructs rtoa^^L^Z? ? 9 endo 9 enous P"*™*" in a host cell; (3) 
asequenceenc^ingaiura^^^ 

BRIEF DESCRIPTION OF THE TABLES 

Tables 1 and 2." The REF Tables refer to a number of MaSnum »„„,h ? " " Sequence Tables 1 and 2. SEQ 
to the longest cDNA obtained, either by ctanE£ b ^tS3iSTrc^o " ^ MLS —I™* 

S t?™^ sequence - ^ sequence 01 me 

[0012J The REF Table .ncludes the following information relating to each MLS: ' 

I. cDNA Sequence 

30 A. 5' UTR 

B. Coding Sequence 
C 3* UTR 



70 



15 



20 



25 



35 



II. Genomic Sequence 

A. Exons 

B. Introns 

C. Promoters 

40 III. Link of cDNA Sequences to Clone IDs 

IV. Multiple Transcription Start Sites 
V Polypeptide Sequences 

A. Signal Peptide 
45 B. Domains 

C. Related Polypeptides 

VI. Related Polynucleotide Sequences 
I. cDNA SEOUFMrP 



so 



55 



... included „ „ REF T ate ^„ ~ lE^£ZX££2££ *> 



A. 5' UTR 



["0,4, ^^^"TRc^d.*^^^^ 
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genomic sequence as indicated in the REF Tables. The sequence that matches, beginning at any of the transcriptional 
start sites and ending at the last nucleotide before any of the translational start sites corresponds to the 5' UTR. 

B. Coding Region 

5 

[0015] The coding region is the sequence in any open reading frame found in the MLS. Coding regions of interest 
are indicated in the Poly P SEQ subsection of the REF Tables. 

C. S'UTR 

TO 

[001 6] The location of the 3* UTR can be determined by comparing the most 3' MLS sequence with the corresponding 
genomic sequence as indicated in the REF Tables. The sequence that matches, beginning at the translational stop 
site and ending at the last nucleotide of the MLS corresponds to the 3' UTR. 

™ II. GENOMIC SEQUENCE 

[0017] Further, the REF Tables indicate the specific gi' number of the genomic sequence if the sequence resides in 
a public databank. For each genomic sequence, the REF Tables indicate which regions are included in the MLS. These 
regions can include the 5' and 3* UTRs as well as the coding sequence of the MLS. See, for example, the scheme below: 



25 



30 



55 



Region 1 Region 2 Region 3 

| 5 * UTR | Exon | — I Exon I ~ | Exon I 3' UTR 

I ' I , 

Promoter | intron Intron | 

Translational stop Codon 

Start Site 



[001 8] The REF Tables report the first and last base of each region that are included in an MLS sequence. An example 
is shown below: 

gi No. 47000: 
35 37102 ... 37497 

37593 ... 37925 

The numbers indicate that the MLS contains the following sequences from two regions of gi No. 47000; a first region 
including bases 37102-37497, and a second region including bases 37593-37925. 

40 A. EXON SEQUENCES 

[0019] The location of the exons can be determined by comparing the sequence of the regions from the genomic 
sequences with the corresponding MLS sequence as indicated by the REF Tables. 

I* INITIAL EXON 

[0020] To determine the location of the initial exon, information from the 

(1) polypeptide sequence section; 
50 (2) cDNA polynucleotide section: and 

(3) the genomic sequence section 

of the REF Tables are used. First, the polypeptide section will indicate where the translational start site is located in 
the MLS sequence. The MLS sequence can be matched to the genomic sequence that corresponds to the MLS. Based 
on the match between the MLS and corresponding genomic sequences , the location ol the translational start site can 
be determined in one of the regions of the genomic sequence. The location of this translational start site is the start of 
the first exon. 

[0021] Generally, the last base of the exon of the corresponding genomic region, in which the translational start site 



"9 
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il. INTERNAL EXONR 

[0023] Except for the regions that comprise the 5' and 3' UTRs initial a*™ a nH.«^- ■ 

regions that match the MLS sequence are the internal exons ' ! eXOa ^'emaining genomic 

remaining regions afso define me intron/exon jJSE "^inSSons * dennin9 ^ * 

HI. TERMINAL EXON 

[0024, As with the initial exon. the tecation of the ,erm*a. exon h determined *th information from the 

(1) polypeptide sequence section; 

(2) cDNA polynucleotide section; and 

(3) the genomic sequence section 

corresponding genomic sequences. ^S^^^T^ JT* °" *" MLS a " d 

" genomicsequence. The location of this st£coo^ ,h the 2^ f «W«™n6d in one of the regions of the 
of the corresponding genom* m^ S^SS^SSStiT^ T ° M * ^ teSe ° f 1,16 ex °" 
represent the beginning of the terminal m£T^^£££ " ^ S,0f> °°* n W3S *» 
terminal exon, which will be the only exon transanal start site will represent the start of the 

B. INTRON SEQtJPMrFCi 



70 



75 



20 



4<? C. PROMOTER SEQUENCER 

HI. LINK of cDNA SEQUENCES to CI amf »n e 

MLS tha, is Eluded in the Cone. I, eithe he 5 o^ fZSSt'IS T ^ *° °< « he 

sequence, no mention will be made. 6 ° DNA clone se 9 u ence is the same as the MLS 



45 



SO 



IV. Multiple Transcription Start Sites 



55 



a portk. o, nwth. Th. po«™ loea " on °' "an M« cm b. Mih„ 
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that corresponds to the MLS. 

[0030] To determine the location of the transcription start sites with the negative numbers, the MLS sequence is 
aligned with the corresponding genomic sequence. In the instances when a public genomic sequence is referenced, 
the relevant corresponding genomic sequence can be found by direct reference to the nucleotide sequence indicated 
by the gi' number shown in the public genomic DNA section of the REF tables. When the position is a negative number, 
the transcription start site is located in the corresponding genomic sequence upstream of the base that matches the 
beginning of the MLS sequence in the alignment. The negative number is relative to the first base of the MLS sequence 
which matches the genomic sequence corresponding to the relevant gi" number. 

[0031] In the instances when no public genomic DNA is referenced, the relevant nucleotide sequence for alignment 
is the nucleotide sequence associated with the amino acid sequence designated by gi # number of the later PolyP SEQ 
subsection. 

V, Polypeptide Sequences 

[0032] The PolyP SEQ subsection lists SEQ I D NOs and Ceres SEQ I D NO for polypeptide sequences corresponding 
to the coding sequence of the MLS sequence and the location of the translations start site with the coding sequence 
of the MLS sequence. 

[0033] The MLS sequence can have multiple translation^ start sites and can be capable of producing more than 
one polypeptide sequence. 

A. Signal Peptide 

[0034] The REF Tables also indicate in subsection (B) the cleavage site of the putative signal peptide of the polypep- 
tide corresponding to the coding sequence of the MLS sequence. Typically, signal peptide coding sequences comprise 
a sequence encoding the first residue of the polypeptide to the cleavage site residue. 

B. Domains 

[0035] Subsection (C) provides information regarding identified domains (where present) within the polypeptide and 
(where present) a name for the polypeptide domain. 

C. Related Polypeptides 

[0036] Subsection (Dp) provides (where present) information concerning amino acid sequences that are found to be 
related and have some percentage of sequence identity to the polypeptide sequences of REF and SEQ TABLES 1 
AND 2. These related sequences are identified by a gi* number. 

VI. Related Polynucleotide Sequences 



[0037] Subsection (Dn) provides polynucleotide sequences (where present) that are related to and have some per- 
centage of sequence identity to the MLS or corresponding genomic sequence. 



Abbreviation 


Description 


Max Len. Seq. 


Maximum Length Sequence 


rel to 


Related to 


Clone Ids 


Clone ID numbers 


Pub gDNA 


Public Genomic DNA 


gi No. 


gi number 


Gen. seq. in cDNA 


Genomic Sequence in cDNA (Each region for a single gene prediction is 
listed on a separate line. 




In the case of multiple gene predictions, the group of regions relating to a 
single prediction are separated by a blank line) 


(Ac) cDNA SEQ 


cDNA sequence 
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Abbreviation 

• Pat. Appln, SEQ ID NO 

- Ceres SEQ ID NO: 1673877 

- SEQ # w. TSS 

- Clone ID # ; # -> # 
PolyP SEQ 

- Pat. Appln. S EQ ID NO: 

- Ceres SEQ I 



(Title) 

Loc. SEQ ID NO #: # -> # aa. 



(continued) 

Descripti 

Patent Applicatio n SEQ ID NO: 
Ceres SEQ ID NO: 

Clone ID < 



comprises 
ide Seque 
Patent Application SEQ ID NO 



bases # to # of the cDNA Sequence 




(Dp) Rel. AA SEQ 

- Align. NO 
gi No 
Desp. 

- % Idnt. 
Align. Len. 

Loc. SEQ ID NO: # -> # i 



Related Amino Acid Sequences 
Alignment number 
Gi number 
Description 
Percent identity 
Alignment Length 



Location within SEQ ID NO: from # to * am in„ " 
DETAILED DE SCRIPTION OF THE INVFNTIOM 

[0038] The invention relates to (I) polynucleotides and methods of use thereof, such as 

IA. Probes, Primers and Substrates; 

IB. Methods of Detection and Isolation; 

B.1. Hybridization; 
B.2. Methods of Mapping; 
B.3. Southern Blotting; 

B.4. Isolating cDNA from Related Organisms' 

B. 5. Isolating and/or Identifying Orthologous Genes 

IC. Methods of Inhibiting Gene Expression 

C. I. Antisense 
C.2. Ribozyme Constructs; 
C.3. Chimeraplasts; 
C.4 Co-Suppression; 
C.5. Transcriptional Silencing 
C6. Other Methods to Inhibit Gene Expression 

ID. Methods of Functional Analysis; 

IE. Promoter Sequences and Their Use; 
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IF. UTRs and/or Intron Sequences and Their Use; and 

IG. Coding Sequences and Their Use. 

[0039] The invention also relates to (II) polypeptides and proteins and methods erf use thereof, such as 
IIA. Native Polypeptides and Proteins 

A.1 Antibodies 

A. 2 In Vitro Applications 

I IB. Polypeptide variants, Fragments and Fusions 

B. 1 Variants 
B.2 Fragments 
B.3 Fusions 

[0040] The invention also includes (III) methods of modulating polypeptide production, such as 

I HA. Suppression 

A.1 Antisense 
A.2 Ribozymes 
A.3 Co-suppression 

A. 4 Insertion of Sequences into the Gene to be Modulated 
A. 5 Promoter Modulation 

A. 6 Expression of Genes containing Dominant-Negative Mutations 
II IB. Enhanced Expression 

B. 1 Insertion of an Exogenous Gene 
B.2 Promoter Modulation 

[0041] The invention further concerns (IV) gene constructs and vector construction, such as 

IVA. Coding Sequences 
IVB. Promoters 
IVC. Signal Peptides 

[0042] The invention still further relates to 
V Transformation Techniques 

Definitions 

[0043] Allelic variant An allelic variant' is an alternative form of the same SDF. which resides at the same chro- 
mosomal locus in the organism. Allelic variations can occur in any portion of the gene sequence, including regulatory 
regions. Allelic variants can arise by normal genetic variation in a population. Allelic variants can also be produced by 
genetic engineering methods. An allelic variant can be one that is found in a naturally occurring plant, including a 
cultivar or ecotype. An allelic variant may or may not give rise to a phenotypic change, and may or may not be expressed. 
An allele can result in a detectable change in the phenotype of the trait represented by the locus. A phenotypically 
silent allele can give rise to a product. 

[0044] Alternatively spliced messages Within the context of the current invention, alternatively spliced messag- 
es' refers to mature mRNAs originating from a single gene with variations in the number and/or identity of exons, 
introns and/or intron-exon junctions. 

[0045] Chimeric The term chimeric* is used to describe genes, as defined supra, or contructs wherein at least 
two of the elements of the gene or construct, such as the promoter and the coding sequence and/or other regulatory 
sequences and/or filler sequences and/or complements thereof, are heterologous to each other. 
[0046] Constitutive PromotenPromoters referred to herein as 'constitutive promoters' actively promote transcription 
under most, but not necessarily all, environmental conditions and states of development or cell differentiation. Examples 
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cleotides of the present invention, is PARSK1. the promoter from the Arabidopsis gene encoding a serine-threonine 
kinase enzyme, and which promoter is induced by dehydration, abscissic acid and sodium chloride (Wang and Good- 
man, Plant J. 8:37 (1995)) Examples of environmental conditions that may affect transcription by inducible promoters 
include anaerobic conditions, elevated temperature, or the presence of light. 

[0057] intergenic region Intergenic region,' as used in the current invention, refers to nucleotide sequence oc- 
curring in the genome that separates adjacent genes. 

[0058] Mutant gene In the current invention, mutant' refers to a heritable change in DNA sequence at a specific 
location. Mutants of the current invention may or may not have an associated identifiable function when the mutant 
gene is transcribed. 

[0059] Orthologous Gene In the current invention orthologous gene' refers to a second gene that encodes a 
gene product that performs a similar function as the product of a first gene. The orthologous gene may also have a 
degree of sequence similarity to the first gene. The orthologous gene may encode a polypeptide that exhibits a degree 
of sequence similarity to a polypeptide corresponding to a first gene. The sequence similarity can be found within a 
functional domain or along the entire length of the coding sequence of the genes andtor their corresponding polypep- 
tides. 

[0060] Percentage of sequence identity 'Percentage of sequence identity,' as used herein, is determined by 
comparing two optimally aligned sequences over a comparison window, where the fragment of the polynucleotide or 
amino acid sequence in the comparison window may comprise additions or deletions (e.g., gaps or overhangs) as 
compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two 
sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid 
base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number 
of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 
to yield the percentage of sequence identity. Optimal alignment of sequences for comparison may be conducted by 
the local homology algorithm of Smith and Waterman Add. APL Math.Z.A82 (1981), by the homology alignment al- 
gorithm of Needleman and Wunsch J. Mot. Biol. 48:443 (1970), by the search for similarity method of Pearson and 
Lipman Proc. Natl Acad Sci. (USA)BS: 2444 (1988), by computerized implementations of these algorithms (GAP, 
BESTFIT, BLAST, PASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group 
(GCG), 575 Science Dr., Madison, Wl), or by inspection. Given that two sequences have been identified for comparison, 
GAP and BESTFIT are preferably employed to determine their optimal alignment Typically, the default values of 5.00 
for gap weight and 0. 30 for gap weight length are used. The term 'substantial sequence identity' between polynucleotide 
or polypeptide sequences refers to polynucleotide or polypeptide comprising a sequence that has at least 80% se- 
quence identity, preferably at least 85%, more preferably at least 90% and most preferably at least 95%, even more 
preferably, at least 96%, 97%, 98% or 99% sequence identity compared to a reference sequence using the programs. 
[0061] Plant Promoter A plant promoter' is a promoter capable of initiating transcription in plant cells and can 
drive or facilitate transcription of a fragment of the SDF of the instant invention or a coding sequence of the SDF of the 
instant invention. Such promoters need not be of plant origin. For example, promoters derived from plant viruses, such 
as the CaMV35S promoter or from Agrobactehum tumefaciens such as the T-DNA promoters, can be plant promoters. 
A typical example of a plant promoter of plant origin is the maize ubiquitin-1 (ubi-1 promoter known to those of skill. 
[0062] Promoter: The term 'promoter,' as used herein, refers to a region of sequence determinants located 
upstream from the start of transcription of a gene and which are involved in recognition and binding of RNA polymerase 
and other proteins to initiate and modulate transcription. A basal promoter is the minimal sequence necessary for 
assembly of a transcription complex required for transcription initiation. Basal promoters frequently include a TATA 
box' element usually located between 1 5 and 35 nucleotides upstream from the site of initiation of transcription Basal 
promoters also sometimes include a CCAAT box" element (typically a sequence CCAAT) and/or a GGGCG sequence 
usually located between 40 and 200 nucleotides, preferably 60 to 120 nucleotides, upstream from the start site of 
transcription. 

[0063] Public sequence: The term public sequence ,' as used in the context of the instant application, refers to 
any sequence that has been deposited in a publicly accessible database. This term encompasses both amino acid, 
and nucleotide sequences. Such sequences are publicly accessible, for example, on the BLAST databases on the 
NCBI FTP web site (accessible at ncbmlm.gov/blast). The database at the NCBI GTP site utilizes gi" numbers assigned 
by NCBI as a unique identifier for each sequence in the databases, thereby providing a non-redundant database for 
sequence from various databases, including GenBank. EMBL, DBBJ, (DNA Database of Japan) and PDB(Brookhaven 
Protein Data Bank). 

[0064] Regulatory Sequence The term regulatory sequence,' as used in the current invention, refers to any 
nucleotide sequence that influences transcription or translation initiation and rate, and stability and/or mobility of the 
transcript or polypeptide product. Regulatory sequences include, but are not limited to, promoters, promoter control 
elements, protein binding sequences. 5' and 3' UTRs, transcriptional start site, termination sequence, polyadenylatbn 
sequence, introns, certain sequences within a coding sequence, etc. 
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of the total A+B in the composition is A. Preferably. A comprises at least about 90% by weight of the total of A+B in 
the composition, more preferably at least about 95% or even 99% by weight. For example, a plant gene or a DNA 
sequence can be considered substantially free of other plant genes or DNA sequences. 

[0074] Translational start site In the context of the current invention, a translational start site" is usually an ATG 
in the cDNA transcript, more usually the first ATG. A single cDNA, however, may have multiple translational start sites. 
[0075] Transcription start site Transcription start site" is used in the current invention to describe the point at 
which transcription is initiated. This point is typically located about 25 nucleotides downstream from a TFIID binding 
site, such as a TATA box. Transcription can initiate at one or more sites within the gene, and a single gene may have 
multiple transcriptional start sites, some of which may be specific for transcription in a particular cell-type or tissue. 
[0076] Untranslated region (UTR) A UTR" is any contiguous series of nucleotide bases that is transcribed, but 
is not translated. These untranslated regions may be associated with particular functions such as increasing mRNA 
message stability. Examples of UTRs include, but are not limited to polyadenylation signals, terminations sequences, 
sequences located between the transcriptional start site and the first exon (5* UTR) and sequences located between 
the last exon and the end of the mRNA (3' UTR). 

[0077] Variant: The term variant" is used herein to denote a polypeptide or protein or polynucleotide molecule 
that differs from others of its kind in some way. For example, polypeptide and protein variants can consist of changes 
in amino acid sequence and/or charge and/or post-translational modifications (such as grycosytation, etc). 

DETAILED DESCRIPTION OF THE INVENTION 

I. Polynucleotides 

[0078] Exemplified SDFs of the invention represent fragments of the genome of corn, wheat, rice, soybean or Ara- 
bidopsis and/or represent mRNA expressed from that genome. The isolated nucleic acid of the invention also encom- 
passes corresponding fragments of the genome and/or cDNA complement of other organisms as described in detail 
below. 

[0079] Polynucleotides of the invention can be isolated from polynucleotide libraries using primers comprising se- 
quence similar to those described by the REF and SEQ Tables. See, for example, the methods described in Sambrook 
et al., supra. 

[0080] Alternatively, the polynucleotides of the invention can be produced by chemical synthesis. Such synthesis 
methods are described below. 

[0081] It is contemplated that the nucleotide sequences presented herein may contain some small percentage of 
errors. These errors may arise in the normal course of determination of nucleotide sequences. Sequence errors can 
be corrected by obtaining seeds deposited under the accession numbers cited above, propagating them, isolating 
genomic DNA or appropriate mRNA from the resulting plants or seeds thereof, amplifying the relevant fragment of the 
genomic DNA or mRNA using primers having a sequence that flanks the erroneous sequence, and sequencing the 
amplification product. 

I. A. Probes, Primers and Substrates 

[0082] SDFs of the invention can be applied to substrates for use in array applications such as, but not limited to, 
assays of global gene expression, for example under varying conditions of development, growth conditions. The arrays 
can also be used in diagnostic or forensic methods (WO95/35505, US 5,445,943 and US 5,410,270). 
[0083] Probes and primers of the instant invention will hybridize to a polynucleotide comprising a sequence in REF 
and SEQ TABLES 1 AND 2. Though many different nucleotide sequences can encode an amino acid sequence, the 
sequences of REF and SEQ TABLES 1 AND 2 are generally preferred for encoding polypeptides of the invention. 
However, the sequence of the probes and/or primers of the instant invention need not be identical to those in REF and 
SEQ TABLES 1 AND 2 or the complements thereof. For example, some variation in probe or primer sequence andfor 
length can allow additional family members to be detected, as well as orthoiogous genes and more taxonomically 
distant related sequences. Similarly, probes and/or primers of the invention can include additional nucleotides that 
serve as a label for detecting the formed duplex or for subsequent cloning purposes. 

[0084] Probe length will vary depending on the application. For use as primers, probes are 12-40 nucleotides, pref- 
erably 18-30 nucleotides long. For use in mapping, probes are preferably 50 to 500 nucleotides, preferably 100-250 
nucleotides long. For Southern hybridizations, probes as long as several kilobases can be used as explained below. 
[0085] The probes and/or primers can be produced by synthetic procedures such as the triester method of Matteucci 
et al. J. Am. Chem. Soc. U)3:3185( 1981); or according to Urdea et al. Proc. Natl. Acad, 80:7461 (1981) or using 
commercially available automated oligonucleotide synthesizers. 
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associated with a phenotype, all SDFs can be used as probes for identifying polymorphisms associated with phenotypes 
of interest. Briefly, one method of mapping involves total DNA isolation from individuals. It is subsequently cleaved with 
one or more restriction enzymes, separated according to mass, transferred to a solid support, hybridized with SDF 
DNA and the pattern of fragments compared. Polymorphisms associated with a particular SDF are visualized as d'rf- 

s f erences in the size of fragments produced between individual DNA samples after digestion with a particular restriction 
enzyme and hybridization with the SDF. After identification of polymorphic SDF sequences, linkage studies can be 
conducted. By using the individuals showing polymorphisms as parents in crossing programs, F2 progeny recombinants 
or recombinant inbreds, for example, are then analyzed. The order of DNA polymorphisms along the chromosomes 
can be determined based on the frequency with which they are inherited together versus independently. The closer 

10 two polymorphisms are together in a chromosome the higher the probability that they are inherited together, integration 
of the relative positrons of all the polymorphisms and associated marker SDFs can produce a genetic map of the 
species, where the distances between markers reflect the recombination frequencies in that chromosome segment. 
[0094] The use of recombinant inbred lines for such genetic mapping is described for Arabidopsis by Alonso-Blanco 
et al. {Methods in Molecular Biology, vol.82, Arabidopsis Protocols', pp. 1 37-1 46, J.M. Martinez-Zapater and J. Salinas, 

is eds., c. 1998 by Humana Press, Totowa, NJ) and for corn by Burr ( Mapping Genes with Recombinant Inbreds*, pp. 
249-254. In Freeling, M. and V. Walbot (Ed.), The Maize Handbook, c. 1994 by Springer-Verlag New York, Inc.: New 
York, NY, USA; Berlin Germany; Burr et ai. Genetics (1998) 118: 519; Gardiner, J. et al., (1993) Genetics 134: 917). 
This procedure, however, is not limited to plants and can be used for other organisms (such as yeast) or for individual 
cells. 

20 [0095] The SDFs of the present invention can also be used for simple sequence repeat (SSR) mapping. Rice SSR 
mapping is described by Morgante et al. (The Plant Journal (1993) 3: 165), Panaud et ai. {Genome (1995) 38: 1170); 
Senior et al. {Crop Science (1996) 36: 1676), Taramino et al. (Genome (1996) 39: 277) and Ann etal. {Molecular and 
General Genetics (1993) 241: 483-90). SSR mapping can be achieved using various methods. In one instance, poly- 
morphisms are identified when sequence specific probes contained within an SDF flanking an SSR are made and used 

25 jn polymerase chain reaction (PCR) assays with template DNA from two or more individuals of interest. Here, a change 
in the number of tandem repeats between the SSR-flanking sequences produces differently sized fragments (U.S. 
Patent 5,766,847). Alternatively, polymorphisms can be identified by using the PCR fragment produced from the SSR- 
flanking sequence specific primer reaction as a probe against Southern blots representing different individuals (U.H. 
Refseth et al., (1997) Electrophoresis 18: 1519). 

30 [0096] Genetic and physical maps of crop species have many uses. For example, these maps can be used to devise 
positional cloning strategies for isolating novel genes from the mapped crop species. In addition, because the genomes 
of closely related species are largely syntonic (that is, they display the same ordering of genes within the genome), 
these maps can be used to isolate novel alleles from relatives of crop species by positional cloning strategies. 
[0097] The various types of maps discussed above can be used with the SDFs of the invention to identify Quantitative 

35 Trait Loci (QTLs). Many important crop traits, such as the solids content of tomatoes, are quantitative traits and result 
from the combined interactions of several genes. These genes reside at different bci in the genome, oftentimes on 
different chromosomes, and generally exhibit multiple alleles at each locus. The SDFs of the invention can be used to 
identify QTLs and isolate specific alleles as described by de Vicente and Tanksley (Genetics 134.585 ( 1 993)). In addition 
to isolating QTL alleles in present crop species, the SDFs of the invention can also be used to isolate alleles from the 

40 corresponding QTL of wild relatives. Transgenic plants having various combinations of QTL alleles can then be created 
and the effects of the combinations measured. Once a desired allele combination has been identified, crop improvement 
can be accomplished either through btolechnoiogical means or by directed conventional breeding programs (for review 
see Tanksley and McCouch, Science 277:1063 (1997)). 

[0098] In another embodiment, the SDFs can be used to help create physical maps of the genome of corn, Arabi- 
cs dopsis and related species. Where SDFs have been ordered on a genetic map, as described above, they can be used 
as probes to discover which clones in large libraries of plant DNA fragments in YACs, BACs, etc. contain the same 
SDF or similar sequences, thereby facilitating the assignment of the large DNA fragments to chromosomal positions. 
Subsequently, the large BACs, YACs, etc. can be ordered unambiguously by more detailed studies of their sequence 
composition (e.g. Marra et al. (1997) Genomic Research 7:1072-1084) and by using their end or other sequences to 
50 find the identical sequences in other cloned DNA fragments. The overlapping of DNA sequences in this way allows 
large contigs of plant sequences to be built that, when sufficiently extended, provide a complete physical map of a 
chromosome. Sometimes the SDFs themselves will provide the means of joining cloned sequences into a conlig. 
[0099] The patent publication WO95/35505 and U.S. Patents 5,445,943 and 5,410,270 describe scanning multiple 
alleles of a plurality of loci using hybridization to arrays of oligonucleotides. These techniques are useful for each of 
55 the types of mapping discussed above. 

[0100] Following the procedures described above and using a plurality of the SDFs of the present invention, any 
individual can be genotyped. These individual genotypes can be used for the identification of particular cultrvars, va- 
rieties, lines, ecotypes and genetically modified plants or can serve as tools for subsequent genetic studies involving 
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multiple phenotypic traits. 

B.3 Southern Blot Hvbridi7atinn 
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-merest to obtain a probe. Simi.ar.y. if the SDF hclucies a domain of inures, to 



B 4 - 1 Isolating DNA from Related Organisms 



[ «^^I^^T e - ir TT c am be USed ,0 18013,6 ,he «"W«fihg DNA from other organisms Either cDNA 
or genome DNA can be isolated. For isolating genomic DNA, a lambda, cosmid BAC or YAC or other ll™ t«£ 

Sy^S^ 

ion uy odmorooK ei ai. l y«9 (Molecular Cloning: A Laboratory Manual 2nd Pf i rnM cw inrt u ^ • u * 

midl ! 8Cfeen 8 Ph " e ,ibrary> ,0r eXamp,e " ™°"*™nt 'ambda clones are^.ed out on a^f^ t^l 
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or non-radioactively labeled SDF DNA at room temperature for about 16 hours, usually in the presence of 50% forma- 
mide and 5X SSC (sodium chloride and sodium citrate) buffer and blocking reagents. The plaque lifts are then washed 
at 42°C with 1% Sodium Dodecyl Sulfate (SDS) and at a particular concentration of SSC. The SSC concentration used 
is dependent upon the stringency at which hybridization occurred in the initial Southern blot analysis performed. For 
example, if a fragment hybridized under medium stringency (e.g., Tm - 20°C), then this condition is maintained or 
preferably adjusted to a less stringent condition (e.g.. Tm-30°C) to wash the plaque lifts. Positive clones show detect- 
able hybridization e.g., by exposure to X-ray films or chromogen formation. The positive clones are then subsequently 
isolated for purification using the same general protocol outlined above. Once the clone is purified, restriction analysis 
can be conducted to narrow the region corresponding to the gene of interest. The restriction analysis and succeeding 
subcloning steps can be done using procedures described by, for example Sambrook et al. (1989) cited above. 
[0111] The procedures outlined for the lambda library are essentially similar to those used for YAC library screening, 
except that the YAC clones are harbored in bacterial colonies. The YAC clones are plated out at reasonable density 
on nitrocellulose or nylon filters supported by appropriate bacterial medium in petri plates. Following the growth of the 
bacterial clones, the filters are processed through the denaturation, neutralization, and washing steps following the 
procedures of Ausubei et al. 1992. The same hybridization procedures for lambda library screening are followed. 
[0112] To isolate cDNA, similar procedures using appropriately modified vectors are employed. For instance, the 
library can be constructed in a lambda vector appropriate for cloning cDNA such as Xgt11. Alternatively, the cDNA 
library can be made in a plasmid vector. cDNA for cloning can be prepared by any of the methods known in the art, 
but is preferably prepared as described above. Preferably, a cDNA library will include a high proportion of full-length 
clones. 

B. 5. Isolating and/or Identifying Ortholooous Genes 

[0113] Probes and primers of the invention can be used to identify and/or isolate polynucleotides related to those in 
REF and SEQ TABLES 1 AND 2. Related polynucleotides are those that are native to other plant organisms and exhibit 
either similar sequence or encode polypeptides with similar biological activity. One specific example is an orthobgous 
gene. Orthologous genes have the same functional activity. As such, orthologous genes may be distinguished from 
homobgous genes. The percentage of identity is a function of evolutionary separation and, in closely related species, 
the percentage of identity can be 98 to 100%. The amino acid sequence of a protein encoded by an orthologous gene 
can be less than 75% identical, but tends to be at Ieast75% or at least 80% identical, more preferably at least 90%, 
most preferably at least 95% identical to the amino acid sequence of the reference protein. To find orthologous genes, 
the probes are hybridized to nucleic acids from a species of interest under low stringency conditions, preferably one 
where sequences containing as much as 40-45% mismatches will be able to hybridize. This condition is established 
b Y T m " 4000 10 T m * 4800 ( see below). Blots are then washed under conditions of increasing stringency. It is preferable 
that the wash stringency be such that sequences that are 85 to 1 00% identical will hybridize. More preferably, sequences 
90 to 100% identical will hybridize and most preferably only sequences greater than 95% identical will hybridize. One 
of ordinary skill in the art will recognize that, due to degeneracy in the genetic code, amino acid sequences that are 
identical can be encoded by DNA sequences as little as 67% identical or less. Thus, it is preferable, for example, to 
make an overlapping series of shorter probes, on the order of 24 to 45 nucleotides, and individually hybridize them to 
the same arrayed library to avoid the problem of degeneracy introducing large numbers of mismatches. 
[0114] As evolutbnary divergence increases, genome sequences also tend to diverge. Thus, one ol skill will recog- 
nize that searches for orthologous genes between more divergent species will require the use of fower stringency 
conditions compared to searches between closely related species. Also, degeneracy of the genetic code is more of a 
problem for searches in the genome of a species more distant evolutbnarily from the species that is the source of the 
SDF probe sequences. 

[0115] The SDFs of the invention can also be used as probes to search for genes that are related to the SDF within 
a species. Such related genes are typically considered to be members of a gene family. In such a case, the sequence 
similarity will often be concentrated into one or a few fragments of the sequence. The fragments of similar sequence 
that define the gene family typically encode a fragment of a protein or RNA that has an enzymatic or structural function. 
The percentage of identity in the amino acid sequence of the domain that defines the gene family is preferably at least 
70%, more preferably 80 to 95%, most preferably 85 to 99%. To search for members of a gene larnity within a species, 
a low stringency hibridization is usually performed, but this will depend upon the size, distribution and degree of se- 
quence divergence of domains that define the gene family. SDFs encompassing regulatory regions can be used to 
identify coordinate^ expressed genes by using the regulatory region sequence of the SDF as a probe. 
[0116] In the instances where the SDFs are identified as being expressed from genes that confer a particular phe- 
notype, then the SDFs can also be used as probes to assay plants of different species for those phenotypes. 
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I.C. Methods to Inhibit Gene Expression 

Antisense Constructs; 
Ribozyme Constructs; 
Chimeraplast Constructs; 
Co-Suppression; 
Transcriptional Silencing; and 
Other Methods of Gene Expression. 

C.t Antisense 

[0118] In some instances it is desirable to suppress exoression ni an <^ 
.nstance is the FLAVOR-SAVOR" tomato, h lich 1^^^^ 

approach, thus delaying softening of the fruit after rip^Sa See^L^nS o " ,nact,vated b V » antisense 
No. 5,723.766; Oel.er, et al. Science, 254:43M39M9?^ P f * N °" 5 - 859 ' 330 -' U S - Pa «"« 

of flowering can be con.rol.ed by suppression of Z FLWERM Ct^tT'T ^ ^ *** 
assorted with late flowering, while absence of ^Cisa^ateTJ^ a ^? ( ^ toWh * ,ranscripl are 
11:949 (1 999). Also, the transition of apical meristem Uc^ZTr,T W8 " n9 MiChae,S et al - «"* <*» 

regulated by TERM,NAL FLOWER1, APET^ZT^^Tl aSSOCiated Shoo,s to flo ^ring is 

production to flowering, ft is desirable to ? t i ; eSireCl, ^ dUCeatranSitk * 1 ' romShoo, 
another instance, arrested ovule devetopmiS and fTrn^le steX^esuK .2 T""' ^ Ce "Jl :1 °° 7 0*®»- As 
enzyme but can be reversed by applicatton of ethvtenlTn £2 J T Suppress,on of ethylene forming 
to manipufcte female fertility of plal^Te £ in i creasino^r uit^?, J* CB " 11:1061 < 1999 »- The ab % 
[0119] In the case of polynucleotides usedtoTnhl* pr ° duc,lon and cre a«"9 hybrids, 

need not be perfect* idemS. 2 ~ ^Z^oXS^^^^ ^ "»««» 

[0120] Some polynucleotide SDFs in REF and SEQ TABLES 1 ANin 9 rol! 

corn, wheat, rice, soybean AraWtops* and/or other olants Thutth^? ^ repreS ? n, •«>»»«*• that are expressed in 
erate antisense constructs to inhibS transit nSSE^JE? IT* SeqUences to 

[0121] Toaccomp.ishth.3. apofynucleotio^TegXfS °' ^ ***** in a plart <*«• 

from the desired gene (the antiseL segment^™ ^ 
™»°* transcribed when the cons.mc^ 

contro. transcription of the antisense segment so na, JSLSi£r3 T"*" ° Sed in the construct <° 

[0122] The antisense segment to be intriuced S Under d6Sired circu ™tances. 
endogenous gene or genes'to be Z^T^^ l ° * 3 - ^ 

-^^^ 

not have the same intron or exon pattern and homoloo^ J no^ Furth8n «*». «» Produced sequence need 
a sequence of between about 30 or 40 r^^SS^^^^^^T^ ?*> ** ^ M °- 
of at least about 100 nucleotides is preferred TseSnce J £?* , ^ ^ be ° Sed - a «W~ 

sequence of at leas, about 500 nucta^.£5^^ *°" nUC ' e ° tideS " ™ re p —^and a 



C.2. Rifaozvmes 
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[0125] A number of classes of ribozymes have been identified. One class of ribozymes is derived from a number of 
small circular RNAs, which are capable of selfcleavage and replication in plants. The RNAs replicate either atone (viroid 
RNAs) or with a helper virus (satellite RNAs). Examples include RNAs from avocado sunblotch viroid and the satellite 
RNAs from tobacco ringspot virus, lucerne transient streak virus, velvet tobacco mottle virus, solanum nodiflorum mottle 
virus and subterranean clover mottle virus. The design and use of target RNAspecific ribozymes is described in Haseloff 
et ai. Nature, 334:585 (1988). 

[0126] Like the antisense constructs above, the ribozyme sequence fragment necessary for pairing need not be 
identical to the target nucleotides to be cleaved, nor identical to the sequences in REF AND SEQ TABLES 1 AND 2. 
Ribozymes may be constructed by combining the ribozyme sequence and some fragment of the target gene which 
would allow recognition of the target gene mRNA by the resulting ribozyme molecule. Generally, the sequence in the 
ribozyme capable of binding to the target sequence exhibits a percentage of sequence identity with at least 80%. 
preferably with at least 85%. more preferably with at least 90% and most preferably with at least 95%, even more 
preferably, with at least 96%. 97%, 98% or 99% sequence identity to some fragment of a sequence in REF AND SEQ 
TABLES 1 AND 2 or the complement thereof. The ribozyme can be equally effective in inhibiting mRNA translation by 
cleaving either in the untranslated or coding regions. Generally, a higher percentage of sequence identity can be used 
to compensate for the use of a shorter sequence. Furthermore, the introduced sequence need not have the same 
intron or exon pattern, and homology of non-coding segments may be equally effective. 



C.3. Chimeraplasts 



[0127] The SDFs of the invention, such as those described by the REF and SEQ Tables, can also be used to construct 
chimeraplasts that can be introduced into a cell to produce at least one specific nucleotide change in a sequence 
corresponding to the SDF of the invention. A chimeraplast is an oligonucleotide comprising DNA and/or RNA that 
specifically hybridizes to a target region in a manner which creates a mismatched base-pair. This mismatched base- 
pair signals the cell l s repair enzyme machinery which acts on the mismatched region resulting in the replacement, 
insertion or deletion of designated nucleotide(s). The altered sequence is then expressed by the cell's normal cellular 
mechanisms. Chimeraplasts can be designed to repair mutant genes, modify genes, introduce site-specific mutations, 
and/or act to interrupt or alter normal gene function (US Pat. Nos. 6,010,907 and 6.004,804- and PCT Pub No 
W099/58723 and WO99/07865). 



C.4. Sense Suppression 



[0128] The SDFs of REF and SEQ TABLES 1 AND 2 of the present invention are also useful to modulate gene 
expression by sense suppression. Sense suppression represents another method of gene suppression by introducing 
at least one exogenous copy or fragment of the endogenous sequence to be suppressed. 

[0129] Introduction of expression cassettes in which a nucleic acid is configured in the sense orientation with respect 
to the promoter into the chromosome of a plant or by a self-replicating virus has been shown to be an effective means 
by which to induce degradation of mRNAs of target genes. For an example of the use of this method to modulate 
expression of endogenous genes see, Napoli et al., The Plant Cell 2:279 (1990), and U.S. Patents Nos. 5,034,323, 
5,231 ,020, and 5,283,184. Inhibition of expression may require some transcription of the introduced sequence. 
[0130] For sense suppression, the introduced sequence generally will be substantially identical to the endogenous 
sequence intended to be inactivated. The minimal percentage of sequence identity will typically be greater than about 
65%, but a higher percentage of sequence identity might exert a more effective reduction in the level of normal gene 
products. Sequence identity of more than about 80% is preferred, though about 95% to absolute identity would be 
most preferred. As with antisense regulation, the effect would likely apply to any other proteins within a similar family 
of genes exhibiting homology or substantial homology to the suppressing sequence. 

C.5. Transcriptional Silencing 



[0131] The nucleic acid sequences of the invention, including the SDFs of REF and SEQ TABLES 1 AND 2. and 
fragments thereof, contain sequences that can be inserted into the genome of an organism resulting in transcriptional 
silencing. Such regulatory sequences need not be operatively linked to coding sequences to modulate transcription of 
a gene. Specifically, a promoter sequence without any other element of a gene can be introduced into a genome to 
transcriptionally silence an endogenous gene (see, for example, vaucheret, H et al. (1998) The Plant Journal 16: 
651-659). As another example, triple helices can be formed using oligonucleotides based on sequences from REF 
AND SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto. The oligonucleotide can 
be delivered to the host cell and can bind to the promoter in the genome to form a triple helix and prevent transcription. 
An oligonucleotide of interest is one that can bind to the promoter and block binding of a transcription factor to the 
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with transcription binding factors comp,ementary to the sequences of the promoter that interact 

C.6. Other Methods tn inhi bit Gene Exp ect™ 
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as binding sites for additional transcription factors that have the function of modulating the level of transcription with 
respect to tissue specificity and of transcriptional responses to particular environmental or nutritional factors and the 
like. 

[0142] Short DNA sequences representing binding sites for proteins can be separated from each other by intervening 
sequences of varying length. For example, within a particular functional module, protein binding sites may be constituted 
by regions of 5 to 60, preferably 10 to 30, more preferabry 10 to 20 nucleotides. Within such binding sites, there are 
typically 2 to 6 nucleotides that specifically contact amino acids of the nucleic acid binding protein. The protein binding 
sites are usually separated from each other by 10 to several hundred nucleotides, typically by 15 to 150 nucleotides, 
often by 20 to 50 nucleotides. DNA binding sites in promoter elements often display dyad symmetry in their sequence. 
Often elements binding several different proteins, and/or a plurality of sites that bind the same protein, will be combined 
in a region of 50 to 1,000 basepairs. 

[0143] Elements that have transcription regulatory function can be isolated from their corresponding endogenous 
gene, or the desired sequence can be synthesized, and recombined in constructs to direct expression of a coding 
region of a gene in a desired tissue-specific, temporal-specific or other desired manner of inducibility or suppression. 
When hybridizations are performed to identify or isolate elements of a promoter by hybridization to the long sequences 
presented in REF AND SEQ TABLES 1 AND 2, conditions are adjusted to account for the above-described nature of 
promoters. For example short probes, constituting the element sought, are preferabry used under low temperature 
and/or high salt conditions. When long probes, which might include several promoter elements are used, low to medium 
stringency conditions are preferred when hybridizing to promoters across species. 

[0144] If a nucleotide sequence of an SDR or part of the SDF, functions as a promoter or fragment of a promoter, 
then nucleotide substitutions, insertions or deletions that do not substantially affect the binding of relevant DNA binding 
proteins would be considered equivalent to the exemplified nucleotide sequence. It is envisioned that there are in- 
stances where it is desirable to decrease the binding of relevant DNA binding proteins to silence or down-regulate a 
promoter, or conversely to increase the binding of relevant DNA binding proteins to enhance or up-regulate a promoter 
and vice versa. In such instances, polynucleotides representing changes to the nucleotide sequence of the DNA-protein 
contact region by insertion of additional nucleotides, changes to identity of relevant nucleotides, including use of chem- 
ically-modified bases, or deletion of one or more nucleotides are considered encompassed by the present invention. 
In addition, fragments of the promoter sequences described by the REF and SEQ Tables and variants thereof can be 
fused with other promoters or fragments to facilitate transcription and/or transcription in specific type of cells or under 
specific conditions. 

[01 45] Promoter function can be assayed by methods known in the art, preferably by measuring activity of a reporter 
gene operatively linked to the sequence being tested for promoter function. Examples of reporter genes include those 
encoding luciferase, green fluorescent protein, GUS, neo, cat and bar. 

I.F. UTRs and Junctions 

[01 46] Polynucleotides comprising untranslated (UTR) sequences and intron/exon junctions are also within the scope 
of the invention. UTR sequences include introns and 5 # or 3' untranslated regions (5* UTRs or 3' UTRs). Fragments of 
the sequences shown in REF AND SEQ TABLES 1 AND 2 can comprise UTRs and intron/exon junctions. 
[01 47] These fragments of SDFs, especially UTRs, can have regulatory functions related to, for example, translation 
rate and mRNA stability. Thus, these fragments of SDFs can be isolated for use as elements of gene constructs for 
regulated production of polynucleotides encoding desired polypeptides. 

[0148] Introns of genomic DNA segments might also have regulatory functions. Sometimes regulatory elements, 
especially transcription enhancer or suppressor elements, are found within introns. Also, elements related to stability 
of heteronuclear RNA and efficiency of splicing and of transport to the cytoplasm for translation can be found in intron 
elements. Thus, these segments can also find use as elements of expression vectors intended for use to transform 
plants. 

[0149] Just as with promoters UTR sequences and intron/exon junctions can vary from those shown in REF AND 
SEQ TABLES 1 AND 2. Such changes from those sequences preferabry will not affect the regulatory activity of the 
UTRs or intron/exon junction sequences on expression, transcription, or translation unless selected to do so. However, 
in some instances, down- or up-regulation of such activity may be desired to modulate traits or phenotypic or in vitro 
activity. 

I.G. Coding Sequences 

[01 50] Isolated polynucleotides of the invention can include coding sequences that encode polypeptides comprising 
an amino acid sequence encoded by sequences in REF AND SEQ TABLES 1 AND 2 or an amino acid sequence 
presented in REF AND SEQ TABLES 1 AND 2. 
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[0151] Anucleot.de sequence encodes a polypeptide tf a cell (ora cellfree invitro system) expressing that nucleotide 
sequenceprcducesa^^ 

and the primary transcript ,s subsequent* processed and transited by a host cel. (or a cel. f reeY^s3^ 
b^ngthenucle,cacKJ.TT ) us > an isolated nucleic acid that enccdes^ 

nuc.e^a^dencod.nganarn.noac.d sequence also encompasses heteronuctear RNA. whicS conLs sequTncTs mat 
are spliced out dunng expression, and mRNA, which lacks those sequences sequences mat 

Slf 21 J^" 9 S !? UenC ! S ran be ccn *« l « l usi "9 synthesis techniques or by isolating coding sequences 

or by mod.fy.ng such synthesized or isolated coding sequences as described above sequences 
[0153] In addition to coding sequences encoding the polypeptide sequences of REF AND SEQ TABLES 1 AND 2 
which are native to corn AsatUopsis. soybean, rice, wheat, and other plants the isolated polynucleosis car The 

20%. more preferably km than 15%; even more preferably less than 10%. 5%. 3% or 1 % of the number of nuclei 
comprising a parfcularly exemplified sequence. It is generally expected that non^egenerate nSl "Since 
changes mat result .n 1 to 10. more preferably 1 to 5 and most preferably 1 to 3 amino acid insertion delS Tor 
subsMut.o.sw.1. no. greatfy affect m^ 

where,n 1 to 20. preferably 1 to 10. most preferably 1 to 5 nucleotides are added to. deleted from and^ LbstS 
in me sequences specifically disclosed in REF AND SEQ TABLES 1 AND 2 suostituted 
[0155] insertions or deletions in polynucleotides intended to be used for encoding a polypeptide preferably preserve 



as a hybridization probe. 
II. Polypeptides and Proteins 
HA. Native polypeptides and proteins 



[0156] Polypeptides within me scope of the invention include bom native proteins as well as variants fragments 
and fus.ons thereof. Polypeptides of the invention are those encoded by any of the six reading frames oi sequences 

REF AND^E^TABLk'i 3SS S""" T S"? * ?5% ,0 ,h ° Se " a,ive ^peptides of 

AND SE ° TAB , LES 1 2 re Preferably, me polypeptide variants will exhibit at least 85% sequence identic 

EST? p f *. at ir* 909,0 sequence iden,rty: more pre,erab,y at ieast 95% ' 96% . 97 %- ^SSISSS 

dentrfy. Fragments of polypeptide or fragments of polypeptides wi.l exhibit similar percentages of sequence ktenftTto 

EiiEr i K a r ents of *• na,ive p 0 ^ 6 ^ 9 - Fusions win exhibit a sim " ar s i-Srs^rss 

fragment of the fusion represented by the variant of the native peptide «?uen C e loeniny in that 

[0159] Furthermore polypeptide variants will exhibit at least one of the functional properties of the native protein 
Such properties delude, without limitation, protein interact**, DNA interaction, biological activity 
tarty, receptor b.nd,ng. signal transduction, transcription activity, growth factor activrty secondaj ZcSH^S 
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or 95% of at least one activity of me native protein. ' 

!°,I!? J T ^ Varian, °' na,iVe P^P^W" comprises amino acid substitutions, deletions and/br insertions 

?ZLT1r substrtu,lons are P' e,erred ,0 ™intain the function or activity of the polypeptide 

[0161] Within the scope of percentage of sequence identity described above, a polypeptide of the invention ™v 
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A.I Antibodies 

[0162] Isolated polypeptides can be utilized to proouce antibodies. Polypeptides of the invention can generally be 

^TUT' 38 anti96nS '° r raisin 9 an,ibodies b V **>™ "*hnk,ues. The resulting an.ibcdieTa? useful si 
reagents for determmmg the distribution of the antigen protein within the tissues of a p.anfor 
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The antibodies are also useful for examining the production level of proteins in various tissues, for example in a wild- 
type plant or following genetic manipulation of a plant, by methods such as Western blotting. 
[01 63] Antibodies of the present invention, both polyclonal and monoclonal, may be prepared by conventional meth- 
ods. In general, the polypeptides of the invention are first used to immunize a suitable animal, such as a mouse, rat, 
rabbit, or goat Rabbits and goats are preferred for the preparation of polyclonal sera due to the volume of serum 
obtainable, and the availability of labeled anti-rabbit and anti-goat antibodies as detection reagents. Immunization is 
generally performed by mixing or emulsifying the protein in saline, preferably in an adjuvant such as FreuncFs complete 
adjuvant, and injecting the mixture or emulsion parenteral ly (generally subcutaneously or intramuscularly). A dose of 
50-200 pg/injection is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more injections 
of the protein in saline, preferably using Freund's incomplete adjuvant. One may alternatively generate antibodies by 
in vitro immunization using methods known in the art, which for the purposes of this invention is considered equivalent 
to in vivo immunization. 

[01 64] Polyclonal antisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 
the blood at 25°C for one hour, followed by incubating the blood at 4°C for 2-18 hours. The serum is recovered by 
centrifugation (e.g., 1 ,000xg for 10 minutes). About 20-50 ml per bleed may be obtained from rabbits. 
[0165] Monoclonal antibodies are prepared using the method of Kohler and Milstein, Nature 256: 495 (1975), or 
modification thereof. Typically, a mouse or rat is immunized as described above. However, rather than bleeding the 
animal to extract serum, the spleen (and optionally several large lymph nodes) is removed and dissociated into single 
cells. If desired, the spleen cells can be screened (after removal of nonspecifically adherent cells) by applying a cell 
suspension to a plate, or well, coated with the protein antigen. B-cells producing membrane-bound immunoglobulin 
specific for the antigen bind to the plate, and are not rinsed away with the rest of the suspension. Resulting B-cells, or 
all dissociated spleen cells, are then induced to fuse with myeloma cells to form hybridomas, and are cultured in a 
selective medium (e.g., hypoxanthine, aminopterin, thymidine medium, HAT*). The resulting hybridomas are plated by 
limiting dilution, and are assayed for the production of antibodies which bind specifically to the immunizing antigen 
(and which do not bind to unrelated antigens). The selected Mab-secreting hybridomas are then cultured either in vitro 
(e.g., in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites in mice). 

[0166] Other methods for sustaining antibody-producing B-cell clones, such as by EBV transformation, are known. 
[0167] If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional techniques. 
Suitable labels include fluorophores, chromophores, radioactive atoms (particularly 32 P and 125 l), electron-dense re- 
agents, enzymes, and ligands having specific binding partners. Enzymes are typically detected by their activity. For 
example, horseradish peroxidase is usually detected by its ability to convert 3,3\5,5'-tetramethylbenzidine (TNB) to a 
blue pigment, quantifiable with a spectrophotometer. 

A. 2 In Vitro Applications of Polypeptides 

[0168] Some polypeptides of the invention will have enzymatic activities that are useful in vitro. For example, the 
soybean trypsin inhibitor (Kunitz) family is one of the numerous families of proteinase inhibitors. It comprises plant 
proteins which have inhibitory activity against serine proteinases from the trypsin and subtilisin families, thiol protein- 
ases and aspartic proteinases. Thus, these peptides find in vitro use in protein purification protocols and perhaps in 
therapeutic settings requiring topical application of protease inhibitors. 

[0169] Delta-aminolevulinic acid dehydratase (EC 4.2.1.24) (ALAD) catalyzes the second step in the biosynthesis 
of heme, the condensation of two molecules of 5-aminolevulinate to form porphobilinogen and is also involved in chlo- 
rophyll biosynthesis(Kaczor et al. (1994) Plant Physiol. 1-4: 1411-7; Smith (1988) Biochem. J. 249: 423-8; Schneider 
(1976) Z. naturforsch. [C] 31: 55-63). Thus, ALAD proteins can be used as catalysts in synthesis of heme derivatives. 
Enzymes of biosynthetic pathways generally can be used as catalysts for in vitro synthesis of the compounds repre- 
senting products of the pathway. 

[01 70] Polypeptides encoded by SDFs of the invention can be engineered to provide purification reagents to identify 
and purify additional polypeptides that bind to them. This allows one to identify proteins that function as multimers or 
elucidate signal transduction or metabolic pathways. In the case of DNA binding proteins, the polypeptide can be used 
in a similar manner to identify the DNA determinants of specific binding (S. Pierrou et al.. Anal. Biochem. 229 :99 (1 995), 
S. Chusacultanachai et al., J. Biol. Chem. 274:23591 (1999), Q. Lin et al., J. Biol. Chem. 272:27274 (1997)). 

II.B . POLYPEPTIDE VARIANTS , FRAGMENTS, AND FUSIONS 

[0171] Generally, variants , fragments, or fusions of the polypeptides encoded by the maximum length sequence 
(MLS) can exhibit at least one of the activities of the identified domains and/or related polypeptides described in Sections 
(C) and (D) of REF TABLES 1 and 2 corresponding to the MLS of interest. 
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H.B.(1)\feriants 



conservation of charge, polarity, hydrophobic^ s£ e T £^ 
sequencecanbe substitute 

proving a hydrogen bond in an enzymatic catalysi » S2Sf2 f 31 aCtS 38 a,unc,l0 "a' equivalent, tor example 
are preferabry made among the members ^S^^^^^^^J^^ 30 exem P |ifi ^ «W 
(hydrophobic) amino acids include alanine leucine iSI^^f , ** F ° r exam " 10 ' 108 ™P°'ar 

nine. The polar neutral amino acWs include gS slnTZ^™' phenv,ab "^ ^tophan and memic- 

Tneposrtrvelycharg^^^^ 

acids include aspartic acid and glutamic acid ' n ' ne -'ys'neandh l st K J.ne. The negatively charged (acidic) amino 

else -sr^s^^ u s^~^ a - - -» 

I0176J Yet another class of variants includes those that lack ™ of ,k= • -. 

encoded polypeptides. One example is polypepS c^2T 2, !?. aCt,VitieS ' ° r S,ruc,ural ,eatur <* °' *e 
mutations. Such a variant may comprise^nSXCSsT., C ° mpriSi " 9 *"*»"« • 

ticular domain or group of conserved residues seo . ue ™ wrth non-conservative changes in a par- 



II.A.(2) FRAGMENTS 



seived between an MLS encoded polypeptide m toSlaZS^L^" 5 ! <*#><<* <>***k L,- 



it.A.(3)FUSIONS 



E, 1 1^^^^^^^ ~«- ^Peptide or vanants ^ of 
MLS of me invention fused to second AP2 SZXS? 1^ """■«* *" AP2 helix e ^ by a 
.nvent.00 ateo encompasses fusbns of MLS e^^^'^ZZZT** ^ P ' esei * 
proteins or fragments thereof. polypeptides, variants, or fragments thereof fused with related 



DEFINITION OF DOIMNS 



dontalns MM. to MLS «^^SSt^J^S^ T ^ TABlES ' 2 '» ■*"». to 

(http//pf am. wustl. edu/browse. shtml). 
1- (AAA) AAA-protern family signature 
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containing two AAA domains: 

Mammalian and drosophila NSF (N-ethylma!eimide- sensitive fusion protein) and the fungal homolog, SEC 18. 
These proteins are involved in intracellular transport between the endoplasmic reticulum and Golgi, as well as 
between different Golgi cisternae. 

Mammalian transitional endoplasmic reticulum ATPase (previously known as p97 or VCP) which is involved in the 
transfer of membranes from the endoplasmic reticulum to the golgi apparatus. This protein forms a ring-shaped 
homooligomer composed of six subunits. The yeast homolog is CDC48 and it may play a role in spindle pole 
proliferation. 

- Yeast protein PAS1 , essential for peroxisome assembly and the related protein PAS1 from Pichia pastoris. 
Yeast protein AFG2. 

Sulfotobus acidocaldarius protein SAV and Halobacterium salinarium cdcH which may be part of a transduction 
pathway connecting light to cell division. 

[01 82] Proteins containing a single AAA domain: 

- Escherichia coli and other bacteria ftsH (or hflB) protein. FtsH is an ATP-dependent zinc metallopeptidase that 
seems to degrade the heat-shock sigma-32 factor. 

[0183] It is an integral membrane protein with a large cytoplasmic C-terminal domain that contain both the AAA and 
the protease domains. 

- Yeast protein YME1, a protein important for maintaining the integrity of the mitochondrial compartment. YME1 is 
also a zinc-dependent protease. 

- Yeast protein AFG3 (or YTA10). This protein also seems to contain a AAA domain followed by a zinc-dependent 
protease domain. 

[0184] Subunits from the regulatory complex of the 26S proteasome [6J which is involved in the ATP-dependent 
degradation of ubiquitinated proteins: 

a) Mammalian subunit 4 and homologs in other higher eukaryotes, in yeast (gene YTA5) and fission yeast (gene 
mts2). 

b) Mammalian subunit 6 (TBP7) and homologs in other higher eukaryotes and in yeast (gene YTA2). 

c) Mammalian subunit 7 (MSS1) and homologs in other higher eukaryotes and in yeast (gene CIM5 or YTA3). 

d) Mammalian subunit 8 (P45) and homologs in other higher eukaryotes and in yeast (SUG1 or CIM3 or TBYt) 
and fission yeast (gene Iet1 ). 

[0185] Other probable subunits such as human TBP1 which seems to influences HIV gene expression by interacting 
with the virus tat transactivator protein and yeast YTA1 and YTA6. 

- Yeast protein BCS1 . a mitochondrial protein essential for the expression of the Rieske iron-sulfur protein. 
Yeast protein MSP1 , a protein involved in intramitochondrial sorting of proteins. 

- Yeast protein PASS, and the corresponding proteins PASS from Pichia pastoris and PAY4 from Yarrowia lipolytica. 
Mouse protein SKD1 and its fission yeast homolog (SpAC2G11 .06). 

Caenorhabditis elegans meiotic spindle formation protein mei-1 . 
Yeast protein SAP 1. 

- Yeast protein YTA7. 

- Mycobacterium leprae hypothetical protein A2126A. 

[0186] It is proposed that, in general, the AAA domains in these proteins act as ATP-dependent protein clamps [5]. 
In addition to the ATP-binding 'A* and B* motifs, which are located in the N-terminal half of this domain, there is a highly 
conserved region located in the central part of the domain which was used to develop a signature pattern. 
Consensus pattern: [LIVMT]-x-[LIVMT]-[LIVMF^ 

[1] Froehlich K.-U., Fries H.W., Ruediger M., Erdmann R., Botstein D., Mecke D. J. Cell Biol. 114:443-453(1991). 
[2) Erdmann R., Wiebel F.F, Flessau A., Rytka J., Beyer A., Froehlich K.-U., Kunar W.-K Cell 64:499-510(1991). 
[3] Peters J.-M., Walsh M.J., Franke W.W. EMBO J. 9:1757-1767(1990). 

[4] Kunau W.-H, Beyer A.. Goette K., Marzioch M., Saidowsky J., Skaletz-Rorowski A., Wiebel F.F. Biochimie 75: 
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209-224(1993). 

ABC Tran, below, and ABC2 membraTet^e V ' ^ ^ -U * See ateo Ascriptions of 

3. (ABC Tran) 

ABC transporters family signature 

[01 87] On the basis of sequence similarities a family of related atp hinrw™,^- u 

These proteins are associated with avariety of dis«nct bt SS2 f P - b,nd,n 9P/ ote " 1s has been characterized [1 to 5]. 
majority of them are involved in active tranS S Si S, T* !" P ' Oka,y0,eS and b * a 

these prote™ share a conserved ^^^[T^^^ T* 8 ** ****** ™"*>™*- All 
site. These proteins are collectively ^ ^ZS^^^Z *** i " ClUd9S an ATP-binding 

below (references are only provided iorZZ^^^Z T, ^ ! ° be,0n9 ,0 ,his ,ami ^ are "«ed 
components: a.kytphosphlate ^^2^^^^ " ^ ,ems 

d PP F); ferric enterobactin (fepC); Lichrome (fhuC) SacS Ziv f J ^T, (artP): dipep,ide ^'AD.'dppD/ 
PC); glycine betaine/L-proline (proV)- glutamate/asoatS S^iT\ 9 -^£ n " ne ^ 9* ce '°'-3-Phosphate (ug- 
lactose (lacK); .eucine^soleuche^alfne ^SSSSSS 22£! F^""*! 0 ^ ^ dicitrate CecE); 
"*E>; oligopeptide <amiE/amiF;op P D^ i™? ™* M ™™ O*** nickel (nikD/ 

spermidine/putrescine (potA); sXeS) vS^T^ S5' P ? 6Ph f ^ PUtreSCi " e (potG > : ribose <*«A>: 
and IktB. - Colicin V export protein l^^l j ^ " Hemo| y s ^^kotoxin export proteins hfyB. cyaB 

(nisin) and spaT (subtil^- ^Z^o^tTJ^LT. ^J? 't™^ 
aprD. - Beta-(l. 2 ). gluC an export prote^raTd n^^ 

protein bexA. - Cytochrome c biogenesis protein cx^ fa^kTl f ^P^'^saccharide export 

protein kpsT. - Ce.l division assorted ftsE Mto c'cSL " PB ** fc aCid ,ranSport 

domonas stutzeri. - Modulation protein nodi from Fftto^Z process,n 9 ***** nosF from Pseu- 

cydD. - Subunit A of the ABC eLbn Z^EZ^*7£ """^ " «* P '°« ei " s «** and 

epidermidis (gene msrA). - Tylosin resistant orieti frZS * res,stence P«**> from Staphylococcus 

Nonprotein (genehetijfromAnaciTn^^ 

of a high affinity transport system. - yhbG a putative ^TITJT MyC ° P ^ Sma a Probable component 

Escherichia co.i. Klebsie.la pneumonia T Pseu^nifo^ Rht^ ' n ^ n,,A h ^ baCteria such as 
Escherichia co.i and related bacteria hypomeSZoTl, me " ,0t ' a " d ThW ««"» 'errooxidans. - 
VhiG, yhiH. yjcW. yjjK. yoj.. yrbF and Xe^T^^' ™* ff* W ^ 
of closely related proteins which extrude a wide vS* of ££2^^32T ' ' (P - 9lyCopro, ™>- a ,am "V 
transmembrane conductance regulator (CFTR) whfch S ™2 n^fli k ( ! feV ' eW 889 ^ ' **** 
Antigen peptide transporters! (TAP1 PSR rYngTS^ TT^T^Z^ ,he ,ranSp ° rt of chtori *» " 
are invofved in the transport ^m^^^i^^^ ^ P f * R,NG1 1 • HAM * ^p2). which 
MHC cfcss I molecules. - 70 Kd peLisom^ memTra^e p7o t^e^^ *» 
X-linkedadrenoi e ukodystr^hy[9J.-SuffonylureareceTtLn^ I ' 8 perox,somal P ro ^i" involved in 

channel. - Drosophila proteL white StZT ' £ '! P Utat,Vesubun,to,,heB < e "ATP-sensrtive potassium 
Pigments. - Fungal ^SSmS^ ^ ° f 

one. - Yeast mitochondrL transporter ATM?'- vSS?^S'r^^ ,W ^ Wrt ^ a ^P^»^ 
protein (gene PDR5 or STS1 0^1^*^^^^^ ] ^ ^ " YeaS ' s P°^min resistance 
invofved in the transport of metal-bound ZZ tT^'S'T ^ ^ ^ is probab, V 

hba 2) . - Fission yeas, leptomycin B resistance V^J^r?£T? A T^™* ^ ^ bW ° f 
Lrverwort. - Prestalk-specific protein tagB from stiZmJuZZTrl 1 ' P ' hypo,he,,cal ^loroplast protein from 

one or two copies of me ATP-binding motifs A an^B' " '° n9in9 ,£> ^ ^ a,SO contain 
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[ 1] Higgins C.F.. Hyde S.C., Mimmack M.M., Gileadi U. ( Gill D.R., Gallagher M.P. J. Bioenerg Biomembr 22* 
571-592(1990). 

[ 21 Higgins C.F., Gallagher M.P., Mimmack M.M., Pearce S.R BioEssays 8:111-116(1988). 

[ 3] Higgins C.F., Hiles I.D., Salmond G.P.C., Gill D.R., Dpwnie J.A., Evans I.J., Holland LB., Gray U Buckels S. 

D., Bell A.W., Hermodson M.A. Nature 323:448-450(1986). 

[ 4] Doolittle R.F., Johnson M.S., Husain I., van Houten B., Thomas D.C., Sancar A. Nature 323:451-453(1986). 
[ 5] Blight M.A., Holland I.B. Mol. Microbiol. 4:873-880(1990). 

[ 6) Stoddard G.W., PetzelJ.R, van Belkum M.J., Kok J.. McKay LL Appl. Environ. Microbiol. 58:1 952-1 961 (1 992). 
[ 7] Rosteck PR Jr., Reynolds P.A., Hershberger C.L Gene 102:27-32(1991). 
[8] Gottesman M.M., Pastan I. J. Biol. Chem. 263:12163-12166(1988). 
[ 9] Valle D., Gaertner J. Nature 361:682-683(1993). 

[10] Aguilar-Bryan L, Nichols C.G., Wechsler S.W., Clement J.P. IV, Boyd A.E. III, Gonzalez G., Herrera-Sosa H, 
Nguy K., Bryan J., Nelson DA Science 268:423-426(1995). 

4. (ACBP) 

Acyl-CoA-binding protein signature 

[0189] Acyl-CoA-binding protein (ACBP) is a small (10 Kd) protein that binds medium- and long^hain acyt-CoA 
esters with very high affinity and may function as an intracellular carrier of acyl-CoA esters [1J. ACBP is also known 
as diazepam binding inhibitor (DBI) or endozepine (EP) because of its ability to displace diazepam from the benzodi- 
azepine (B2D) recognition site located on the GABA type A receptor. It is therefore possible that this protein also acts 
as a neuropeptide to modulate the action of the GABA receptor [2J.ACBP is a highly conserved protein of about 90 
residues that has been so far found in vertebrates, insects and yeast ACBP is also related to the N-terminal section 
of a probable transmembrane protein of unknown function whichhas been found in mammals. As a signature pattern, 
the region that corresponds to residues 1 9 to 37 in mammalian ACBP was selected. 
Consensus pattern: P-[STAJ-x-[DEN]-x-[LIVMF]-x(2HLIVMFY]-Y-[GSTA]-x-[FY]-K-Q-[STA](2)-x-G- 

[ 1) Rose T.M., Schultz E.R., Todaro G.J. Proc. Natl. Acad. Sci. U.S.A. 89:11287-11291(1992). 
[ 2] Costa E., Guidotti A. Life Sci. 49:325-344(1991). 

5. (AIRS) 

AIR synthase related proteins 

[0190] This family includes Hydrogen expression/formation protein HypE, AIR synthases, FGAM synthase and se- 
lenide, water dikinase. 

6. (AMP-binding) 

Putative AMP-binding domain signature 

[01 91] It has been shown [1 to 5] that a number of prokaryotic and eukaryotic enzymes which all probably act via an 
ATP-dependent covalent binding of AMP to their substrate, share a region of sequence similarity. These enzymes are: 
- Insects luciferase (luciferin 4-monooxygenase). Luciferase produces light by catalyzing the oxidation of luciferin in 
presence of ATP and molecular oxygen. - Alpha-aminoadipate reductase from yeast (gene LYS2). This enzyme cata- 
lyzes the activation of alpha-aminoadipate by ATP-dependent adenylation and the reduction of activated alpha-ami- 
noadipate by NADPH. - Acetate-CoA ligase (acetyl-CoA synthetase), an enzyme that catalyzes the formation of acetyl- 
CoA from acetate and CoA. - Long-chain-fatty-acid-CoA ligase, an enzyme that activates long<;hain fatty acids for 
both the synthesis of cellular lipids and their degradation via beta-oxidation.-4-coumarate~CoA ligase (4CL), a plant 
enzyme that catalyzes the formation of 4-coumarate-CoA f rom 4-coumarate and coenzyme A; the branchpoint reactions 
between general phenylpropanoid metabolism and pathways leading to various specific end products. - O-succinyl- 
benzoic acid-CoA ligase (OSB-CoA synthetase) (gene menE) [6], a bacterial enzyme involved in the biosynthesis of 
menaquinone (vitamin K2). - 4-Chlorobenzoate--CoA ligase (EC 6.2.1.-) (4-CBA-CoA ligase) [7], a Pseudomonas 
enzyme involved in the degradation of 4-CBA. - Indoleacetate-tysine ligase (lAA-lysine synthetase) [8], an enzyme 
from Pseudomonas syringae that converts indoleacetate to lAA-lysine. - Bile acid-CoA ligase (gene baiB) from Eubac- 
terium strain VPI 12708 [4J. This enzyme catalyzes the ATP-dependent formation of a variety of C-24 bile acid-CoA.- 
Crotonobetaine/carnitine-CoA ligase (EC 6.3.2.-) from Escherichia coli (gene caiC). - L-(alpha-aminoadipyl)-L-cystei- 
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nyl-D-valme synthetase (ACV synthetase) from various fungi (gene acvA or ocbAm tv 

step ,n the biosynthesis of penicillin and cephalosporin SoZo^f ?rZ? L BnZy ™ Catelyzes first 
amino acids seemto be activated by adenylatbn »t^^ 

domains of about 1000 amino acids - GraEh iZK7Z^.^^^ C ^^ , * , « , 

lyzes the first step in the biosynthesis of the cycle aSSS^lK brSViS - This 

lafcn™ - Tyrosine synthetase I ^^^^STS S »* ArP - dependent Nation of pheny- 

catalyzedbygrsA-GramicidinSsyletaVe.^^ 
that activates and polymerizes pro.^ 

bactin synthetase components E (gene entEl and Fr™ 7l„£? J ? 8 01 f ° Ur relaled 0°™*™. - Entero- 
^ATP^epend^trvat^ 

biosynthesis. - Cyclic peptide anttoiotic surfacln ^E^E^S^S? TJ"*' (en,er0Chelin ) 
contains three related domains v^ilesubunit3onh/rnnt!I!o , 2 Bacillus subtile. Subunits 1 and2 

^•-ca^^^ 
".^^^^ 

proteins are: - ORA Mapeptide-^^^^^ en » ™— 

which shows a high degree £ ^ J^i^S?? ^ " ^ bUt 

to beatranscripticflalacti^torwhic* modulates theaS 

ter operon. But it is believed [9] that angR lisZ a dnS, SKterophore) biosynthesis gene clus- 

thesis of anguibactin. This cinctatah baseTon ?^e B ^.k ^ m 6n2yme inVolved in «" e bi ^- 

angR (,048 residues), which is faTbfcg^ * * e AM ™<*9 *main: «he sizeof 
acyMhioesterase immediately dovi^ 

uginosa. - Escherichia coli hypothetic! Zefc 2 Ye^Kf? "I ^ ^ * regi0n h Pseu *>monas aer- 
YBR222C. - Yeast hypothet^prSe^mS 

cine, serine, and ,„SL e is SedTa ^ rich in ghy- 

[ 1] Toh H. Protein Seq. Data Anal. 4:111,117(199!) 

! « tf* Earl AJ > TUmer G EMBO J - 9:2743-2750(1990) 
I 3J Schroeder J. Nucleic Acids Res. 1 7:460-460( 1 989) 

j 4) Mallonee D H .. Adams J.L.. Hylemon P.B. J. Bacteriol. 174.2065-2071(1992) 

2 l mV n K , D K T a M - Marahie ' M A Microbiol. 6:529-546(1992, 
6] Dnscoll J R., Taber H.W. J. Bacteriol. 174:5063-5071(1992) 

^"K, ^^ M ' J ° ■ «- K.-H.. Liang P.-H., 

[ 8] Farrel. D.H., Mikesell P., Actis LA, Crosa J.H. Gene 86:45-51 (1990). 

7. AP2 domain 

[0193] This60aminoac.dresiduedomaincanbFndtoDNAf1l This dornain * ni*n» *- u 

are suggested to be related to pyridoxal phosohate-bindinn rtlrLL u P SpeCrf,C * Members * this family 

are also described in Jofuku et^ 

09/026,039. copending U.S. Patent applications 08/700,152, 08/879,827, 08/912,272, 

[1] Ohme-takagi M, Shinshi H; Plant Cell 1995*7- 173-1 82 

[2) Weigel D; Plant Cell 1 995;7: 388-389. 

[3] Mushegian AR. Koonin BV, Genetics 1996; 144*81 7-828 



8. ARID 
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9. (ATP synt) 

ATP synthase gamma subunit signature 

[01 95] ATP synthase (proton-translocating ATPase) (EC 3.6.1.34) [1 .2] is a componentof the cytoplasmic membrane 
of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATPase complex 
is composed of an oligomeric transmembrane sector, called CF(0), and a catalytic core, called coupling factor CF(1). 
The former acts as a proton channel; the latter is composed of five subunits, alpha, beta, gamma, delta and epsilon. 
Subunit gamma is believed to be important in regulating ATPase activity and the flow of protons through the CF(0) 
complex. The best conserved region of the gamma subunit [3] is its C-terminus which seems to be essential for as- 
sembly and catalysis. As a signature pattern to detect ATPase gamma subunits, a 1 4 residue conserved segment where 
the last amino acid is found one to three residues from the C-terminal extremity was used. 

[01 96] Consensus pattern: [I V>T-x-E-x(2)-[DE]-x(3)-G-A-x-[SAKR]- Note: Pea chloroplast gamma and two Bacillus 
species gamma subunits are not detected by this motif. 

[ 1] Futai M., Noumi T, Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
[ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3) Miki J., Maeda M. t Mukohata Y, Futai M. FEBS Lett. 232:221-226(1988). 

10. (ATP Synt A) 
Synthase a subunit signature 

[01 97] ATP synthase (proton-translocating ATPase) (EC 3.6.1.34) [1 ,2] is a component of the cytoplasmic membrane 
of eubacteria, the inner membrane of mitochondria,and the thylakoid membrane of chloroplasts. The ATPase complex 
is composed of an oligomeric transmembrane sector, called CF(0), which acts as a proton channel, and a catalytic 
core, termed coupling factor CF(1 ).The CF(0) a subunit, also called protein 6 t is a key component of the proton channel; 
il may play a direct role in translocating protons across the membrane. It is a highly hydrophobic protein that has been 
predicted to contain 8 transmembrane regions [3].Sequence comparison of a subunits from all available sources reveals 
very few conserved regions. The best conserved region is located in what is predicted to be the fifth transmembrane 
domain. This region contains three perfectly conserved residues: an arginine, a leucine and an asparagine. Mutagen- 
esis experiments of ATPase activity. This region was selected as a signature pattern. 

Consensus pattern: [STAGN]-x-[STAG]-[LI VMF]-R-L-x-[SAGV]-N-[LIVMT] [R is important for proton translocation] 

[ 1] Futai M., Noumi T, Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
[ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3] Lewis M.L, Chang J.A., Simoni R.D. J. Biol. Chem. 265:10541-10550(1990). 
[ 4] Cain B.D., Simoni R.D. J. Biol. Chem. 264:3292-3300(1989). 

11. ATP synthase B 

[0198] Part of the CF(0) (base unit) of the ATP synthase. The base unit is thought to translocate protons through 
membrane (inner membrane in mitochondria, thylakoid membrane in plants, cytoplasmic membrane in bacteria). The 
B subunits are thought to interact with the stalk of the CF(1 ) subunits. 

12. (ATP synt C) 

ATP synthase c subunit signature 

[01 99] ATP synthase (proton-translocating ATPase) [1 ,2) is a component of the cytoplasmic membrane of eubacteria, 
the inner membrane of mitochondria.and the thylakoid membrane of chloroplasts. The ATPase complex is composed 
of an oligomeric transmembrane sector, called CF(0), which acts as a proton channel, and a catalytic core, termed 
coupling factor CF(1 ).The CF(0) c subunit (also called protein 9, proteolipid, or subunit III) [3,4ps a highly hydrophobic 
protein of about 8 Kd which has been implicated in the proton^onducting activity of ATPase. Structurally subunit c 
consist of two long terminal hydrophobic regions, which probably span the membrane, and a central hydrophilic region. 
N.N'-dicyclohexylcarbodiimide (DCCD) can bind covalently to subunit c and thereby abolish the ATPase activity. DCCD 
binds to a specific glutamate or aspartate residue which is located in the middle ofthe second hydrophobic region near 
the C-terminus of the protein. A signature pattern which includes the DCCD-binding residue was derived. 
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[0200] Consensus pattern: !GSTA h R- [N Q hP . x(10 ^^ [D „ £ ^ ^ 

1 '] M - T - ^ aet * a M- Annu. Rev. Biochem. 58:111-136(1989) 

[ 2] Senior A.E. PhystoL Rev. 68:177-231(1988) 

[3] Ivaschenko A.T.. Karpenyuk T.A., Ponomarenko S.V. Bbkhimiia 56:406^19(199,) 
[ 4] Recpon H.. Perasso FL, Adoutte A.. Quetier F. J. Mol. Evol. 34:292-3030992) 

13. (ATPsyntDE) 

ATP synthase, Delta/Epsilon chain 

14. (ATP syntab) 

ATP synthase alpha and beta subunits signature 

of an oligomer* transmembrane sector caLTcRoT * ' ^ ATPase Com P ,ex is composed 

acts as a proton channel; the latter i ^^^JSS^T > ^ **** CF(1) " The ,omet 

of subunits alpha and beta are relateSTth SSiSSS^T * to ^ TOe 

acidifying a variety of intracellular compartments in eukarvotic cSTuk-S m (V-ATPases) are responsible for 
of a transmembrane and a catalytic sector The sLnLIST ? \ F-ATPases. they are oligomeric complexes 
tothatolF-ATPasebetasubun^SSsZJTr^i! 9 6St SUbunft of ,h8 ratalv tic sector (70 Kd) is rete.ed 
WArchaebacterfc.membrane*^^^^ 

ATPases beta chain and the beta chain* I to F AT^lh, ^ ^ Ctah iS rela,ed to F " 
beta subunits is found [5] in some t^!~£IS^ P T W A P '° ,ein sW,ar ,0 ATPase 
without signal peptide cleavage tS p o teJ fe So^T, « n ? m P™™ export pathway that proceeds 

«exneri,Hr P B6inXamhomona 9 s campe Z^^Z^Z^E*???- ^ h ^ 

a segment of ten amino-acid residues, containing co^^ 

impairment. V Pase alpha cha,n at ,east " as «•» mutagenesis causes catalytic 

[0203] Consensus pattern: P-tSAPHLI^-fDNHHO^S-x-S fThe firs, S is a putative active site residue, 

IJJs^'J'T 1 T \ M o aeda M AnnU ' ReV " aochem - 58:1"-136(1989). 
[ 2J Senior A.E. Physiol. Rev. 68:177-231(1988). 

- [ 3J Nelson N. J. Bioenerg. Biomembr. 21 :553-57l (1 989) 

15. (ATP syntab C) 

ATP synthase ab C terminal. 
[0204] Number of members* 1 90 

[ 2££Z1 w£££5: ™*~ JE: s,ructure at 28 A ■~ hfcn - f,atp - h eart m Ho . 

16. (A deaminase) 

Adenosine and AMP deaminase signature 

[0205] Adenos^e deam^ase catalyzes the hydro^ deamina,™ ofadenosine into inosin, AMP deam^ase cat- 
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alyzes the hydrolytic deamination of AMP into IMP It has been shown [1 ] that these two types of enzymes share three 
regions of sequence similarities; these regions are centered on residues which are proposed to play an important role 
in the catalytic mechanism of these two enzymes. One of these regions, containing two conserved aspartic acid residues 
that are potential active site residues was selected. 

Consensus pattern: [SA]-(LIVM]-[NGS]-[STA]-D-D-P [The two D's are putative active site residues] 
[1] Chang Z., Nygaard P., Chinautt A.C., Kellems RE. Biochemistry 30:2273-2280(1991). 

17. (Acetyltransf) 

Acetyltransferase (GNAT) family. 

[0206] This family contains proteins with N-acetyltransferase functions. 
[1] Neuwald AF, Landsman D; Trends Bbchem Sci 1997;22:154-155. 

1B. (Aconitase C) 

Aconitase family signature 

[0207] Aconitase (aconitate hydratase) (EC 4.2.1.3) [1] is the enzyme from the tricarboxylic acid cycle that catalyzes 
the reversible isomerization of citrate and isocitrate. Cis-aconitate is formed as an intermediary product during the 
course of the reaction. In eukaryotes two isozymes of aconitase are known to exist: one found in the mitochondrial 
matrix and the other found in the cytoplasm. Aconitase, in its active form, contains a 4Fe-4S iron-sulfur cluster; three 
cysteine residues have been shown to be ligands of the 4Fe-4S cluster.lt has been shown that the aconitase family 
also contains the folbwingproteins: - Iron-responsive element binding protein (IRE-BP). IRE-BP is a cytosolic protein 
that binds to iron-responsive elements (IREs). IREs are stem-loop structures found in the S'UTR of ferritin, and delta 
aminolevulinic acid synthase mRNAs, and in the 3'UTR of transferrin receptor mRNA. IRE-BP also express aconitase 
activity. - 3-isopropytmalate dehydratase (EC 4.2.1.33) (isopropylmalate isomerase), the enzyme that catalyzes the 
second step in the biosynthesis of leucine. - Homoaconitase (EC 4.2.1.36) (homoaconitate hydratase), an enzyme that 
participates in the alpha-aminoadipate pathway of lysine biosynthesis and that converts cis-homoaconitate into ho- 
moisocitric acid. - Esherichia coli protein ybhJ.As a signature for proteins from the aconitase family, two conserved 
regions that contain the three cysteine ligands of the 4Fe-4Scluster were selected. 

Consensus pattern: [LIVM]-x(2)-[GSACIVM]-x-[LIV]-[GTIV]-[STP]-C-x(0.1)-T-N-[GSTANl]-x(4)-[LIVMA] [C binds the 
iron-sulfur center] 

Consensus pattern: G-x(2)-[LIVWPQ]-x(3)-[GAC]-C-[GSTAM]-[LIMPTA]-C-[LIMV]-{GA] [The two C's bind the iron-sul- 
fur center] 

[ 1] Gruer M.J., Artymiuk P.J., Guest J.R. Trends Biochem. Sci. 22:3-6(1997). 
19. (Acyl-CoA dh) 

Acyl-CoA dehydrogenases signatures 

[0208] Acyl-CoA dehydrogenases [1 ,2,3] are enzymes that catalyze the alpha, beta-dehydrogenatkxi of acyl-CoA 
esters and transfer electrons to ETF, the electron transfer protein. Acyl-CoA dehydrogenases are FAD flavoproteins. 
This family currently includes: - Five eukaryotic isozymes that catalyze the first step of the beta-oxidation cycles for 
fatty acids with various chain lengths. These are short (SCAD) (EC 1.3.99.2), medium (MCAD) (EC 1.3.99.3), long 
(LCAD) (EC 1.3.99.13), very-long (VLCAD) and short/branched (SBC AD) chain acyl-CoA dehydrogenases. These 
enzymes are located in the mitochondrion. They are all homotetrameric proteins of about 400 amino acid residues 
except VLCAD which is a dimer and which contains, in its mature form, about 600 residues. - Glutaryl-CoA dehydro- 
genase (EC 1.3.99.7) (GCDH). which is involved in the catabolism of lysine, hydroxy lysine and tryptophan. - Isovaleryl- 
CoA dehydrogenase (EC 1.3.99.10) (I VD), involved in the catabolism of leucine. - Acy l-coA dehydrogenases acsA and 
mmgC from Bacillus subtilis. - Butyryl-CoA dehydrogenase (EC 1.3.99.2) from Clostridium acetobutylicum. - Es- 
cherichia coli protein caiA (4). - Escherichia coli protein aidB. Two conserved regions were selected as signature pat- 
terns. The first is located in the center of these enzymes, the second in the C-terminal section. 
Consensus pattern: [GAC]-[LIVM]-[ST]-E-x(2)-(GSAN]-G-[ST]-D-x(2)-|GSA] 
Consensus pattern: lQDE]-x(2)-G-[GS]-x-G-(LIVMFY]-x(2)-[DEN)-x(4HKR]-x(3)-[DEN] 

{ 1] Tanaka K., Ikeda, Matsubara Y, Hyman D.B. Enzyme 38:91-107(1987). 

( 2] Matsubara Y., Indo Y, Naito E., Ozasa H., Glassberg R., Vockley J., Ikeda Y, Kraus J.. Tanaka K. J. Biol. 
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Chem. 264:16321-16331(1989). 



!!!S a K T "' f i; en0, •c K f ,i^OT •• Hashimot °T. J. Biol. Chem. 269:19088-19094(1994) 
[ 4] E*h,er K., Bourg,s P.. Buche, A., Kteber H,P.. Mandrand-Berthelo, M,A. JZL 13:775-786(1994). 



20. (Acyl transf ) 
Acyi transferase domain 
[0209] Number of members: 161 



1995,270:12961-12964. a "- 5 Ares °"*°n. Crystal structure of a fatty acid synthase component." J Biol Chem 



21. Acylphosphatase signatures 



s s ^^^^j^^ ssrsrr acy,phospha,e 

enzymeis not yet clear. Acytphosphatase is ksS^'^S^S^' T ^ »»**>#* «* of this 
isozymes. One seems to be specific to muLu£ Ee 1 X? cl^ ^ are »" ^ 

diflerent tissues.While acylphciphatase havt beeHolar ™ *?an-cornmon type", is found in many 

bacterial and abacterial hy^^ ,n vertebrates.there are a number of 

same activity.These proteins 1%. - ErtJSTS ?S2£r? v "T^ " d that pr0bab * P 088 * 58 th * 
yfIL - Archaeoglobus fulgidus h y pothetica?Soteh aS tS EST ^ " C,,,US SUb, " iS P^tein 
terns. The first is located I the N^rm^S w^e 2 t JZ>T? !?- rS9,0nS ^ Se ' eC,ed 38 si 9" ature P*- 
Consensus pattern: fLl^x-G-x-V-Q^- V -x-[Ff^R ' S * *° part ofthe P rotein 

sequence. 

Consensus pattern: G-[FYWHAVC]-[KRQAMJ-N-x(3hG-x.V-x(5).G 

[ 1J Stefani M., Ramponi G. Life Chem. Rep. 12 271-301(1995) 

f 2] Stefani M.» Taddei N., Ramponi G. Cell. Mot. Life Sci. 53:141-151(1997). 

22. (Adap comp sub) 

Clathrin adaptor complexes medium chain signatures. 
f"ad^ 

areknownas adaptor or clathrin assemb^ls ^ 

with the cytoplasmic tails of membrane prtfeinsfeadin T C<mptexes are b °"™<* t° interact 
adaptor complexes are known: AP-1 ^Ta^Sx^t hn ^ 3 ? oncen,ra,ion - ™™«ls two type of 
the plasma membrane. Both AP-1 anTS-l 7* S2£ lersl? "*** * aSSOC * led Wi,h 

(gamma and beta" in AP-1; alpha and beta in AP 2V * » ,ha, D COns,sl of «"» ,a '9e chains - the adaptins - 

(AP19 in AP-l; AP17 in AP-2 The medtum c£™«™TZTZ W ? ^ ^ ***** «* a smal ' ' ** 

(gene APM1 or YAP54) [3I .Some mor^e gem S£ e^ot^ 

yeast: APM2 and YBR288c., Twoconserved reoions ZZ ?2 f ^ ,U,IOnary rela,ed P rotel "s have also been found in 
region, the other from the central s^^ot tn^sTprrteins Smat "' e patterns - 006 ^ated in the N-terminal 

— p^ 

[1] Pearse B.M., Robinson M.S. Annu. Rev. Cell Biol. 6:151-171(1990) 
2 mI J °" 9 v eward G D - S^berg RW. Genes Dev. 8:60-73 1994) 

GOeb ' M - °' B * C G B - L ™ 5 • Kirchhausen T. Eu, , Biochem. 202: 
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23. (Adenylsucc synt) 
Adenylosuccinate synthetase signatures 

[0212] Adenylosuccinate synthetase (EC 6.3.4.4 ) [1] plays an important role in purinebiosynthesis, by catalyzing the 
GTP-dependent conversion of IMP and aspartic acid to AMP. Adenylosuccinate synthetase has been characterized 
from various sources ranging from Escherichia coli (gene purA) to vertebrate tissues. Invertebrates, two isozymes are 
present - one involved in purine biosynthesis and the other in the purine nucleotide cycle. Two conserved regions were 
selected as signature patterns. The first one is a perfectly conserved octapeptide located in the N-terminal section and 
which is involved in GTP-binding [2]. The second one includes a lysine residue known [2] to be essential for the enzyme's 
activity. 

Consensus pattern: OW-G-D-E-G-K-G 

Consensus pattern: G-l-[GR]-P-x-Y-x(2)-K-x(2)-R [K is the active site residue) 

[ 1] Wiesmueller L. Wittbrodt J. t Noegel AA, Schleicher M. J. Biol. Chem. 266:2480-2485(1991). 

[ 2] Silva M.M., Poland B.W., Hoffman C.R., Fromm H.J., Honzatko R.B. J. Mol. Biol. 254:431-446(1995). 

[ 3] Bouyoub A., Barbier G. ( Forterre P., Labedan B. 2.3.CO;2-\). Mol. Biol. 261:144-154(1996) . 

24. (AdoHcyase) 

S-adenosyH-homocysteine hydrolase signatures 

[021 3] S-adenosyl-L-homocysteine hydrolase (EC 3.31.1 ) (AdoHcyase) is an enzyme of the activated methyl cycle, 
responsible for the reversible hydratation of S-adenosyl-L-homocysteine into adenosine and homocysteine. AdoHc- 
yase is anubiquitous enzyme which binds and requires NAD+ as a cofactor. AdoHcyase is a highly conserved protein 
[1 ] of about 430 to 470 amino acids. Two highly conserved regions were selected as signature patterns. The first pattern 
is located in the N-terminal section; the second is derived from aglycine-rich region in the central part of AdoHcyase; 
a region thought to be involved in NAD-binding. 

Consensus pattern: [GSA]-[CS]-N-x-[FYLM]-S-[ST]-[QA]-[DEN]-x-[AV]-[AT]-[AD]-[AC]-[LIVMCG] 
Consensus pattern: [GA]-{KSl-x(3)-[LIV]-x-G-[FYJ-G-x-[VC)-G-IKRL]-G-x-[ASC] 

[ 1] Sganga M.W., Aksamit R.R., Cantoni G.L, Bauer C.E. Proc. Natl. Acad. Sci. U.S.A. 89:6328-6332(1992). 

25. AhpC/TSA family 

[0214] This family contains proteins related to alkyl hydroperoxide reductaseComment: (AhpC) and thiol specific 
antioxidant (TSA). 

[1] Chae HZ, Robison K, Poole LB, Church G, Storz G, Rhee SG, Proc Natl Acad Sci U S A 1994;91:7017-7021 

26. (Aldose epim) 

[0215] Aldose 1-epimerase putative active site Aldose 1-epimerase (EC 5.1.3.3) (mutarotase) is the enzyme respon- 
sible for the anomeric interconversion ol D-glucose and other aldoses between their alpha- and beta-forms. The se- 
quence of mutarotase from two bacteria, Acinetobacter calcoaceticus and Streptococcus thermophilus is available (1 J. 
It has also been shown that, on the basis of extensive sequence similarities, a mutarotase domain seem to be present 
in the C-terminal half of the fungal GAL10 protein which encodes, in the N-terminal part, for UDP-glucose 4-epimerase. 
The best conserved region in the sequence of mutarotase is centered around a conserved histidine residue which may 
be involved in the catalytic mechanism. 
Consensus pattern: [NS]-x-T-N-H-x-Y-[FW]-N-[LI] 

[ 1] Poolman B., Royer T.J., Mainzer S.E., Schmidt B.R J. Bacterid. 172:4037-4047(1990). 

27. (AlkA DNA repair) 

Alkylbase DNA glycosidases alkA family signature 

[0216] Alkylbase DNA glycosidases [1) are DNA repair enzymes that hydrolyzes the deoxyribose N-gfycosidic bond 
to excise various alkylated bases from a damaged DNA polymer. In Escherichia coli there are two alkylbase DNA 
glycosidases: one (gene tag)which is constitutively expressed and which is specific for the removal of 3-methyladenine 
(EC 3.2.2.20), and one (gene alkA) which is induced during adaptation to alkylation and which can remove a variety 
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of alkylation products (EC 3.2.221) Taa and aikA ^ ™* u 

alkylbase DNA glycosidase (gene MAG1 ) ?' Similari *- ,n »— * there is an 

structural rented to a.KA MAG and alU ^S^S^^SS^T"* » 7 " m6 ^ tede "- — **» is 
terminal endsappear to be unrelated, there toB CBf^JS^^M^ ^ ^ ^ ^ ^ 

- of this region has been selected as a signature pattern to We " conse ' ve d- A portion 

Consensus pattern: G-.-G-x-W- ( ST]- [ AVI-x- I L.VMFY](2).x4LIVM 

[1] Lindahl T., Sedgwick B. Annu. Rev. Biochem. 57- 133-1 57,1 onai 

> S^^fT?' Bje ' landS - ^^^i^WMB68(,«, 

f 3J Chen J.. Derfler B.. Samson L EMBO J. 9:4569-4575(1 990). ^° OBtla30 '- 

28. Ammonium transporters signature 

[0217] A number of proteins involved in the tran<snnrt «f => mm ~ • 

uncharacterized proteins have been sho^n ,1 2HoS elT^ .'^ ^ amembran « as well as some ye. 
transporters MEP1. MEP2 and MEP3 ?SS ESSE" 1 * W " ^ 

rynebacterium glutamicum ammonium and metE^m SIS ~ . ™™ transporter (gene AMT1). - Co- 
transporter amtB. - Bacillus subtilis nrgA -TSSSTt, hT^ V f """J Escherichia c °» P<*ative ammonium 
cn°cys,.s,rahPCC6803h^^^ 

proteins MJ0058 and MJ1343. - Caenorhabditis eleqans SI « P V *^ KJC0CC W Jannaschii hypothetical 
expected by their transport function, these p^b^JSSSJSZT^^^ 4 ' F49E11 ' 3 and M195 a As 
membrane domains. The best conserved reakTsl™ £ ??i .1 P u ^ SCem ,0 contain ,rom 1 ° to 12 trans- 
is used as a signature pattern 9 ,0 be m * e fifth < or si *"> transmembrane region ari 
Conse.sus P a«ern:D W 

j 1J Ninnemann O.. Janniaux J.-C.. Frommer W.B. EMBO J. 133464-3471f 19941 

[JJSp R.M.. Wei, B., Burkovski ,. Eikmanns B.J., Eikmanns M..S , BtoL Chem . 

f 3) Saier M.H. Jr. Adv. Microbiol. Physiol. 40:81-1 36(1 998). 
29. (Arch_histone) 
CBF/NF-Y subunits signatures 

[0218] Diverse DNA binding proteins are known to bind the CCAAT box a rnmmn ■ 

promoter and enhancer regions of a large number of qenesln eufc^« a T**" 9 ° ,B ™ nX ,0Und m ,he 

the CCAAT-binding factor (CBF) or NF-Y 11 1 CBF Z euka 7° ,es ^^st these proteins is one known as 

components both needed for DNA-binding Tbe HAP , ? T *"* ** <* ^ diflere "« 

cytochrome C isc-1 gene (CYC1) as wefas^ttTo^f " " **** ,0,he u P s,ream Nation site of 

their expression. It also reignizel*^ e,ec '™ '-.sport and actuates 

subunit of CBF. known as CBF-A or NF-YB in ^S^ZS^? Nonary related to CBF. The first 
protein of 116 to210amino-acid residues «hM TcoltH hi^„ ? VeaSt *" 38 Php3 h f,ssion * eas t. a 
domain seems to be involved in DNA-bhding a ^S^^h^^^'^ - ^ 90 ' 88 '*'^^ 
subunit of CBF, known as CBF-B or IMF-YA in vertebrates h^P2 in buriri ^ eve '°P e d from its central part The second 
of 265 to 350 amino-acid residues which ^TaTa^Zl h 9 ^ ^ Php2 h ,iSSi0n te a Protein 
the'essentialcore-,2,. seems toconsi tlHs^ 

DNA recognition domain. A signature pattern has beTn deve ^^!^'T^' im ^ mia C ' Xe ^ 
Consensus pattern. C-V-S-E-x-.-S-F-[L,VM]- T SS^^ c -Association domain. 

Consensus pattern: Y-V-N-A-K-Q-Y-x-R-l-L-K-R-R-x a^-A-K-^E- 

^■^SS^ R " HUiiSdUijnen R • '" ^ C • D. Nucleic Acids Res. 20: 

[2]Olesen J.T.. Fikes J.D., Guarente L. Mol. Cell. Biol. 11 : 611-619( 1991 ). 

30. Argininosuccinate synthase signatures 

[0219] Argininosuccinate synthase (EC 6 3 4 5) (AS> is a ■■«»> ™, 

(AS) .s a urea cycle enzyme that catalyzes the penultimate step in 
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arginine biosynthesis: the ATP-dependent ligation of citrulline to aspartate to form argininosuccinate, AMP andpyro- 
phosphate [1,2].ln humans, a delect in the AS gene causes citrullinemia, a genetic disease characterized by severe 
vomiting spells and mental retardation.AS is a homotetrameric enzyme of chains of about 400 amino-acid residues. 
Anarginine seems to be important for the enzyme's catalytic mechanism. The sequences of AS from various prokary- 
otes, archaebacteria and eukaryotes show significant similarity. Two signature patterns have been selected for AS. 
The first is a highly conserved stretch of nine residues located in the N-terminal extremity of these enzymes, the second 
is derived from a conserved region which contains one of the conserved arginine residues. 
Consensus pattern: [ASHFY]-S-G-G-[LV]-D-T-[ST|- 
Consensus pattern: G-x-T-x-K-G-N-D-x(2)-R-F- 

[ 1] van Vliet R, Crabeel M., Boyen A., Tricot C, Staton V., Falmagne P., Nakamura Y., Baumberg S Glansdorff 
N. Gene 95:99-104(1990). 

[ 2) Morris C.J., Reeve J.N. J. Bacterid. 170:3125-3130(1988). 



31. Armadillo/beta-catenin-like repeats 



[0220] Approx. 40 amino acid repeat. Tandem repeats form super-helix of helices that is proposed to mediate inter- 
action of beta-catenin with its ligands. CAUTION: This family does not contain all known armadillo repeats. 

[1] Huber AH, Nelson WJ, Weis Wl ( Cell 1997;90:871-882. 

[2] Gumbiner BM, Curr Opin Cell Biol 1995;7:634-640. 

[3] Cavallo R, Rubenstein D, Peifer M, Curr Opin Genet Dev 1997;7:459-466. 

[4] Su LK, Vogelstein B, Kinzler KW, Science 1993;262:1734-1737. 

[5] Masiarz FR, Munemitsu S, Polakis P Science 1993;262:1731-1734 

[6] Perfer M, Wieschaus E, Cell 1990;63:1167-1176. 



32. (Asn Synthase) 
Asparagine synthase 



[0221] This family is always found associated with GATase 2. Members of this family catalyse the conversion of 
aspartate to asparagine. 



33. Asparaginase_2 
Asparaginase 12 members 

34. (Aspartyl tRNA N) 



Aminoacyl-transfer RNA synthetases class-ll signatures 

[0222] Aminoacyl-tRNA synthetases (EC 6. 1 . 1 .-) [ 1 ] are a group of enzymes which activate amino acids and transfer 
them to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least 
twenty different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are gen- 
erally two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form 
While all these enzymes have a common function, they are widely diverse in terms of subunrt size and of quaternary 
structure. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, 
proline, serine, and threonine are referred to as class-ll synthetases [2 to 6] and probably have a common folding 
pattern in their catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed 
for the class I synthetases [7]. Class-ll tRNA synthetases do not share a high degree of similarity, however at least 
three conserved regions are present [2,5,8]. Signature patterns have been derived from two of these regions 
Consensus pattern: [FYH]-R-x-{DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE] 

Consensus pattern: [GSTALVF]-{DENOHRKP}-[GSTA]-[LIVMFJ-[DEJ-R-[LIVMF]-x-[LIVMSTAGJ-[LIVMFY] 

[ 1] Schimmel P Annu. Rev. Biochem. 56:125-158(1987). 
[ 2] Delarue M„ Moras D. BioEssays 15:675-687(1993). 
[ 3] Schimmel P. Trends Biochem. Sci. 16:1-3(1991). 

[ 4) NagelG.M., Doolittle R.R Proc. Natl. Acad. ScL U.S.A. 88:8121-8125(1991). 
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[ 2 SiS J p Hae K rt ' ein *' Leb9rman R - Nuc,eic *** ^s. 19:3489-3498(1991) 
[6]CusackS. Biochimie 75:1077-1081(1993) *K»liwnj. 

members: 34 simulates GTPase hydrolysis for ARD1 but not ARFs. Number of 



[0223] 



GD. TM,en AS; J Biol Chem 1996 m^SS **• IJ ' T ""* ^ p '«*"*" 

37. Amino acid permeases signature 

ammo and permeases (genes GAP1, AGP2 and AGPSi ^L h 1 1 pfGt6mS are: " Yeas « 9eneral 

Leu/Valflle permease ( 9 eneBAP2). -^m^ZZLZZ^TS "t"™— ^ ALP1 >' " Yeast 
(gene DIPS). - Yeast asparaghne/glutamine p«™Le Z7Z^ ^ aC * permease 

h.st.d.ne permease (gene HIP1 ).- Yeast lysL pe^^^Si^^T"^^^ 6 ^)--^ 
valine and tyrosine permease (gene VAL1/TAT1 ) ^ast t™rtonhan P r °''ne permease (gene PUT4). -Yeast 

transport protein (gene HNM1/CTR1) - Yeast GABA np™!? ? ^T*™ (9ene TAT2/ SCM2). - Yeast choline 
- Fission yeast protein isp5. - Fission ylasf hv^S f " Yeast Hypothetical protein YKL174C 

SpAC11D3.08c. - Emer J£ J, E?p£S U^ZTZT^r- I »~ t hyP °' he,ical 

mease INDA1 . - Salmonella typhimurium L ZT oemtlif, ? 1 J r,cnodemia harzi ~ amino acid per- 
transpor, protein (gene aroP,: EMth^^Z^^^.^ " Escherichia coli «««* amino acid 
GABApermease(genegabP) - Escherichia SlvZ ^ ne/9,yC,ne ,ransporter ^ne cycA). - Escherichia coli 
specific permeaseWpheW ta:^ 

and Klebsiella pneumoniae hypothetical pfoteh^^^^L^^ 0- * P 6 ™ 8 * 56 (98n9 Pf ° Y) - " Esc "^«hia coli 
tein yilK. - Bacil.us subtilis pemLses rocC and rocE S o^kk, * phimurium hypothetical pro- 

seem to contah up to 12 transmembrane segments SSSL^P ? °' Mme - pro »^ 

reg-on wh^h is located in the second transmembrane ££f * P '° ,einS ' ^ ^ 

[3]Re I 2erJ.,FinleyK. > KakudaD.,M C LeodCL ReizerA S'-rP t o 

^etzer a., i>aier M.H. Jr. Protein Sci. 2:20-30(1993). 

38. aakinase (1) Glutamate 5-kinase signature 
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in the biosynthesis of proline from glutamate, the ATP-dependent phosphorylation of L-glutamate into L-glutamate 
5-phosphate. In eubacteria (gene proB) and yeast (1) (gene PROI), GK is a myofunctional protein, while in plants 
and mammals, it is a Afunctional enzyme (P5CS) [2Jthat consists of two domains: a N-terminaJ GK domain and a C- 
terminal gamma-glutamyl phosphate reductase domain fEC 1.2.1.41 ) (see <PDOC00940>).As a signature pattern, a 
highly conserved glycine-and alanine-rich region located in the central section of these enzymes has been selected. 
Yeast hypothetical protein YHR033w is highly similar to GK. 

Consensus pattern: [GSTN]-x(2)-G-x-G-(GCHIM]-x-[STA]-K-[LIVM)-x-[SA]-[TCA]-x(2)-[GALV]-x(3)-G- 
[ 1] Li W., Brandriss M.C. J. Bacteriol. 174:4148-4156(1992). 

[ 2] Hu C.-A.A., Delauney A.J., Verma D.P.S. Proc. NatL Acad. Sci. U.S.A. 89:9354-9358(1992). 
aakinase (2) Aspartokinase signature 

[0227] Aspartokinase (EC 2.7,2.4 ) (AK) [1] catalyzes the phosphorylation of aspartate. The product of this reaction 
can then be used in the biosynthesis of lysine or in the pathway leading to homoserine, which participates in the 
biosynthesis of threonine, isoleucine and methionine. In Escherichia coli, there are three different isozymes which differ 
in their sensitivity to repression and inhibition by Lys, Met and Thr. AK1 (gene thrA)and AK2 (gene metL) are Afunctional 
enzymes which both consist of an N- terminal AK domain and a C-terminal homoserine dehydrogenase domain. AK1 
is involved in threonine biosynthesis and AK2, in that of methionine. The third isozyme, AK3 (gene lysC), is myofunc- 
tional and involved in lysine synthesis. In yeast, there is a single isozyme of AK (gene HOM3). As a signature pattern 
for AK, a conserved region located in the N-terminal extremity has been selected. 
Consensus pattern: [LIVM]-x-K-[FY]-G-G-[ST]-[SCHLIVM]- 
[ 1] Rafalski J.A., Falco S.C. J. Biol. Chem. 263:2146-2151(1988). 

aakinase (3) Gamma-glutamyl phosphate reductase signature 

[0228] Gamma-glutamyl phosphate reductase (EC 1.2.1.41 ) (GPR) is the enzyme that catalyzes the second step in 
the biosynthesis of proline from glutamate, the NADP-dependent reduction of L-glutamate 5-phosphate into L-gluta- 
mate 5-semialdehyde and phosphate. In eubacteria (gene proA) and yeast [1] (gene PR02), GPR is a monofunctional 
protein, while in plants and mammals, it is a Afunctional enzyme (P5CS) [2}that consists of two domains: a N-terminal 
glutamate 5-kinase domain(EC 2.7.2.11 ) (see <PDOC00701 >) and a C-terminal GPR domain. As a signature pattern, 
a conserved region that contains two histidine residues has been selected. This region is located in the last third of GPR. 
Consensus pattern: V-x(5)-A-[LIV]-x-H-l-x(2)-[HY]-[GS]-[ST]-x-H-[ST]-[DE]-x- 1- 

[ 1] Pearson B.M., Hernando Y, Payne J., Wolf S.S., Kalogeropoulos A., Schweizer M. Yeast 12:1021-1031(1996). 
[ 2] Hu C.-A.A., Delauney A.J., Verma D.P.S. Proc. Natl. Acad. Sci. U.S.A. 89:9354-9358(1992). 

39. (abhydrolase) alpha/beta hydrolase fold. This catalytic domain is found in a very wide range of enzymes. 

[0229] [1] Ollis DL, Cheah E, Cygler M, Dtjkstra B t Frolow F, Franken SM, Harel M, Remington SJ, Silman I, Schrag 
J, Sussman JL, Verschueren KHG, Goldman A, Protein Eng 1992;5:197-211. 

40. (Acid phosphat) Histidine acid phosphatases signatures 

[0230] Acid phosphatases (EC 3.1.3.2) are a heterogeneous group of proteins that hydroiyze phosphate esters, 
optimally at low pH. It has been shown [1] that a number of acid phosphatases, from both prokaryotes and eukaryotes, 
share two regions of sequence similarity, each centered around a conserved histidine residue. These two histidines 
seem to be involved in the enzymes 1 catalytic mechanism [2,3]. The first histidine is located in the N-terminal section 
and forms a phosphohistidine intermediate while the second is located in the C- terminal section and possibly acts as 
proton donor. Enzymes belonging to this family are called tiistidine acid phosphatases' and are listed below: 

Escherichia coli pH 2.5 acid phosphatase (gene appA). 
Escherichia coli glucose- 1 -phosphatase (EC 3.1 .3 10) (gene agp). 
Yeast constitutive and repressible acid phosphatases (genes PH03 and PHOS). 
Fission yeast acid phosphatase (gene phol). 
- Aspergillus phytases A and B (EC 3.1.3.8) (gene phyA and phyB). 
Mammalian lysosomal acid phosphatase. 
Mammalian prostatic acid phosphatase. 
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- Caenorhabditis elegans hypothetical proteins B0361.7, C05C10.1, C05C10.4 and F26C11.1. 

[0231] Consensus pattem[L!VMJ-x(2HUVMA]-x(2HUVMJ-x-R-H-[GN>x-R-x-[PAS] [H is the phosphohistidine resi- 
due] 

Consensus pattem[LIVMH-x-[LI\MFAG]-x(2HSTAGI]-H-^ [H is an ac- 

tive site residue] Sequences known to belong to this class detected by the patternALL, except tor rat prostatic acid 
phosphatase which seems to have Tyr instead of the active site His 

[ 1] van Etten R.L, Davidson R., Stevis P.E., MacArthur H., Moore D.L J. Biol. Chem. 266:2313-2319(1991). 
[ 2] Ostanin K., Harms E.H., Stevis P.E., Kuciel R., Zhou M.-M., van Etten R.L J. Biol. Chem. 267:22830-22836 
(1992). 

[ 3] Schneider G., Lindqvist Y, Vihko R EMBO J. 12:2609-2615(1993). 

41. Aconitase family signatures 

[0232] Aconitase (aconitate hydratase) (EC 4.2.1. 3) [1] is the enzyme from the tricarboxylic acid cycle that catalyzes 
the reversible isomerization of citrate and isocitrate. Cis-aconitate is formed as an intermediary product during the 
course of the reaction. In eukaryotes two isozymes of aconitase are known to exist: one found in the mitochondria! 
matrix and the other found in the cytoplasm. Aconitase, in its active form, contains a 4Fe-4S iron-sulfur cluster; three 
cysteine residues have been shown to be ligands of the 4Fe-4S cluster. It has been shown that the aconitase family 
also contains the following proteins: - Iron-responsive element binding protein (IRE-BP). IRE-BP is a cytosolic protein 
that binds to iron-responsive elements (IREs). IREs are stem-loop structures found in the 5'UTR of ferritin, and delta 
aminolevulinic acid synthase mRNAs, and in the 3'UTR of transferrin receptor mRNA. IRE-BP also express aconitase 
activity. - 3-isopropylmalate dehydratase (EC 4.2.1.33) (isopropylmalate isomerase), the enzyme that catalyzes the 
second step in the biosynthesis ot leucine. - Homoaconitase (EC 4.2.1.36 ) (homoaconitate hydratase), an enzyme that 
participates in the alpha-aminoadipate pathway of lysine biosynthesis and that converts cis-homoaconitate into ho- 
moisocitric acid. - Esherichia coli protein ybhJ 

Consensus pattern: [UVM]-x(2)-[GSACIVM]-x-[LT^ [C binds the 

iron-sulfur center] 

Consensus pattern: G-x(2)-[LI\WPQ]-x(3)-[GAC]-C-[GSTAM]-[LIMPTA]-C-[LIMV]-{GA] [The two C's bind the iron-sul- 
fur center]- 

[ 1] Gruer M.J., Artymiuk P.J., Guest J.R. Trends Biochem. Sci. 22:3-6(1997). 

42. Actins signatures 

[0233] Actins [1 to 4] are highly conserved contractile proteins that are present in all eukaryotic cells. In vertebrates 
there are three groups of actin isoforms: alpha, beta and gamma. The alpha actins are found in muscle tissues and 
are a major constituent of the contractile apparatus. The beta and gamma actins co-exists in most cell types as com- 
ponents of the cytoskeleton and as mediators of internal cell motility. In plants [5] there are many isoforms which are 
probably involved in a variety of functions such as cytoplasmic streaming, cell shape determination, tip growth, gravi- 
perception, cell wall deposition, etc. Actin exists either in a monomeric form (G-actin) or in a polymerized form (F-actin). 
Each actin monomer can bind a molecule of ATP; when polymerization occurs, the ATP is hydrolyzed. Actin is a protein 
of from 374 to 379 amino acid residues/The structure of actin has been highly conserved in the course of evolution. 
Recently some divergent actin-like proteins have been identified in several species. These proteins are: - Centractin 
(actin-RPV) from mammals, fungi (yeast ACTS, Neurospora crassa ro-4) and Pneumocystis carinii (actin-ll). Centractin 
seems to be a component of a multi-subunit centrosomal complex involved in microtubule based vesicle motility. This 
subfamily is also known as ARP1. - ARP2 subfamily which includes chicken ACTL, yeast ACT2, Drosophila 14D, C. 
elegans actC. - ARP3 subfamily which includes actin 2 from mammals, Drosophila 66B, yeast ACT4 and fission yeast 
act2. - ARP4 subfamily which includes yeast ACT3 and Drosophila 1 3E. Three signature patterns have been developed. 
The first two are specific to actins and span positions 54 to 64 and 357 to 365. The last signature picks up both actins 
and the actin-like proteins and corresponds to positions 106 to 118 in actins. 
Consensus pattern: [FY]-[LIV]-G-[DE]-E-A-Q-x-[RKQ](2)-G- 
Consensus pattern: W-[IV]-[STAHRK]-x-[DE]-Y-[DNE]-[DE]- 

Consensus pattern: [LM]-[LIVM]-T-E-[GAPQ)-x-[LIVMFYWHQ]-N-[PSTAQ]-x(2)-N-[KR]- 

[ 1] Sheterline P, Clayton J., Sparrow J.C. (In) Actins, 3rd Edition, Academic Press Ltd, London, (1996). 
[ 2] Pollard T.D., Cooper J.A. Annu. Rev. Biochem. 55:987-1036(1986). 
[ 3] Pollard T.D. Curr. Opin. Cell Biol. 1:33-40(1990). 
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[ 4| Rubenslein PA BioEssays 12:309-315(1990). 

[ 5| Meagher R.B.. McLean B.G. Cell MotiL Cytoskeleton 16:164-166(1990). 
43. Adenylate kinase signature 

[0234] Adenylate kinase (EC 2.7.4.3 ) (AK) [1] is a small monomeric enzyme that catalyzes the reversible transfer of 
MgATP to AMP (MgATP + AMP = MgADP + ADP).ln mammals there are three different isozymes: - AK1 (or myokinase), 
which is cytosolic. - AK2, which is located in the outer compartment of mitochondria. - AK3 (or GTP:AMP phospho- 
transferase), which is located in the mitochondrial matrix and which uses MgGTP instead of MgATP.The sequence of 
AK has also been obtained from different bacterial species and from plants and fungi. Two other enzymes have been 
found to be evolutionary related to AK. These are: - Yeast uridylate kinase (EC 2.7.4.-) (UK) (gene URA6) [2] which 
catalyzes the transfer of a phosphate group from ATP to UMP to form UDP and ADP. - Slime mold UMP<;MP kinase 
(EC 2.7.4.14 ) [3] which catalyzes the transfer of a phosphate group from ATP to either CMP or UMP to form CDP or 
UDP and ADP. Several regions of AK family enzymes are well conserved, including the ATP-binding domains. The 
most conserved of all regions have been selected as a signature for this type of enzyme. This region includes an 
aspartic acid residue that is part of the catalytic cleft of the enzyme and that is involved in a salt bridge. It also includes 
an arginine residue whose modification leads to inactivation of the enzyme 
Consensus pattern: [LIVMFYW](3)-D-G-IFYI)-P-R-x(3)-[NO]- 

[ 1] Schulz G.E. Cold Spring Harbor Symp. Quant. Biol. 52:429-439(1987). 

[ 2] Liljelund P., Sanni A., Friesen J.D., Lacroute F. Biochem. Biophys. Res. Commun. 165:464-473(1989). 
[ 3] Wiesmueller L, Noegel A.A., Barzu O., Gerisch G. t Schleicher M. J. Biol. Chem. 265:6339-6345(1990). 
[ 4) Kath T.H., Schmid R., Schaefer G. Arch. Biochem. Biophys. 307:405-410(1993). 

[0235] 44. (adh_short) Short-chain dehydrogenases/reductases family signature. 

[0236] The short-chain dehydrogenases/reductases family (SDR) [1 ] is a very large family of enzymes, most of which 
are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterized 
was Drosophila alcohol dehydrogenase, this family used to be called [2,3,4Jinsect-type\ or 'short-chain' alcohol dehy- 
drogenases. Most member of this family are proteins of about 250 to 300 amino acid residues. The proteins currently 
known to belong to this family are listed below. - Alcohol dehydrogenase (EC 1.1.1.1 ) from insects such as Drosophila. 

- Acetoin dehydrogenase (EC 1. 1.1.5 ) from Klebsiella terrigena (gene budC). - D-beta-hydroxybutyrate dehydrogenase 
(BDH) (EC 1.1.1.30) from mammals. - Acetoacetyl-CoA reductase (EC 1.1.1.36 ) from various bacterial species (gene 
phbB or phaB). • Glucose 1 -dehydrogenase (EC 1.1.1.47 ) from Bacillus. - 3-beta-hydroxysterokJ dehydrogenase (EC 
1.1-1-51) from Comomonas testosteroni. - 20-beta-hydroxysteroid dehydrogenase (EC 1.1.1.53 ) from Streptomyces 
hydrogenans. - Ribttol dehydrogenase (EC 1.1.1.56 ) (RDH) from Klebsiella aerogenes. - Estradiol 17-beta-dehydro- 
genase (EC 1.1.1.62) from human. - Gluconate 5-dehydrogenase (EC 1.1.1.69) from Gluconobacter oxydans (gene 
gno). - 3-oxoacyl-facyl-carrier protein] reductase (EC 1.1.1.100 ) from Escherichia coli (gene fabG) and from plants. - 
Retinol dehydrogenase (EC 1.1.1.105 ) from mammals. - 2-deoxy-d-gluconate 3-dehydrogenase (EC 1.1.1.125 ) from 
Escherichia coli and Erwinia chrysanthemi (gene kduD). - Sorbitol-6-phosphate 2-dehydrogenase (EC 1.1.1.140 ) from 
Escherichia coli (gene gutD)and from Klebsiella pneumoniae (gene sorD). - 15-hydroxyprostaglandin dehydrogenase 
(NAD+) (EC 1.1.1.141 ) from human. - Corticosteroid 11-beta-dehydrogenase (EC 1.1.1.146 ) (11-DH) from mammals. 

- 7-alpha-hydroxysteroid dehydrogenase (EC 1.1.1.159) from Escherichia coli (gene hdhA), Eubacterium strain VPI 
1 2708 (gene baiA) and from Clostridium sordellii. - NADPH-dependent carbonyl reductase (EC 1.1.1.184 ) from mam- 
mals. - Tropinone reductase-l (EC 1.1.1.206) and -II (EC 1.1.1.236 ) from plants. - N-acylmannosamine 1 -dehydroge- 
nase (EC 1.1.1.233 ) from Flavobacterium strain 141-8. - D-arabinitol 2-dehydrogenase (ribulose forming) (EC 
1.1.1.250) from fungi. - Tetrahydroxynaphthalene reductase (EC 1.1.1.252 ) from Magnaporthe grisea. - PterkJine re- 
ductase 1 (EC 1.1.1.253 ) (gene PTR1) from Leishmania. - 2 t 5-dichtoro-2,5-cyclohexadiene-1,4-diol dehydrogenase 
(EC 1.1.-.-) from Pseudomonas paucimobilis. - Cis-1,2-dihydroxy-3,4-cyclohexadiene-1-carboxylate dehydrogenase 
(EC 1.3.1. -) from Acinetobacter calcoaceticus (gene benD) and Pseudomonas putida (gene xylL). - Biphenyt-2,3-di- 
hydro-2,3-diol dehydrogenase (EC 1 .3.1 .-) (gene bphB) from various Pseudomonaceae. - Cis-toluene dihydrodiol de- 
hydrogenase (EC 1.3.1.-) from Pseudomonas putida (gene todD). - Cis-benzene glycol dehydrogenase (EC 1.3.1.19 ) 
from Pseudomonas putida (gene bnzE). - 2, 3-dihydro-2,3-dihydroxybenzoate dehydrogenase (EC 1.3.1.28 ) from Es- 
cherichia coli (gene entA) and Bacillus subtilis (gene dhbA). - Dihydropteridine reductase (EC 1.6.99.7 ) (HDHPR) from 
mammals. - Lignin degradation enzyme ligD from Pseudomonas paucimobilis. - Agropine synthesis reductase from 
Agrobacterium plasmids (gene masl). - Versicolors reductase from Aspergillus parasiticus (gene VER1). - Putative 
keto-acyl reductases from Streptomyces pofyketide biosynthesis operons. - A trifunctional hydratase-dehydrogenase- 
eprmerase from the peroxisomal beta-oxidation system of Candida tropicalis. This protein contains two tandemly re- 
peated "short-chain dehydrogenase-type* domain in its N-terminai extremity. - Modulation protein nodG from species 
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is involved 

adipocyte protein P 27. - Mouse pSr 6 L ^ ! ^ transition protein , (FVT1) . Mouse 
grina 25 Kd devetopmen, speci protein. Dratphifc ESS? S^TSf^ 0 * " 8 ~°»** P~ 
protein encoded in the internals gene region - &c^SK!L^," ^ ™™*ytogenes hypothetical 
ica. protein ydfG. - Escherichia colThypoS. 112^ r^T** * ESCheffchia 001 hyP ° ,het " 

con hypothetical protein yohF - Bacflte suSSiSi n E ? Chench 1 ? ~" ^^.cal P^tein yjgU. - Escherichia 
-Bacillus subtilishypome.fca.p^ 
- Yeast hwofcJTU,* YITO^^^ 

SpAC23D3.11. One of the best conserved reQiJ««*^ J ^ . YKL055c - ' F,ss,on V east hypothetical protein 

^73?^).^ B - Kr00k M ' A,ri3n S - *■ ... Ghosh D. Biochemistry 34: 

j 2] Villarroya A. Juan E., Egestad B.. Joemvall H. Eur. J. Biochem. 180 191-197(1989, 

\SZZE?-JT M - JOernVa " K J BbCh6m - 200:537-543(1991) ' 
M^L.HartnenC.^^^^ 

KorfchSn H- S h h °H -° 2) Sh0rt - Chai " dehydrogenases/reductases family signature 

The short-chain dehydrogenases/reductases famty (SDR) 11] is a very laroe tamHvnf o„„ 

known to be NAD- or NADP^iependent oxidoreducteses Ac hi VJ T ^ mz * mBS - mos « of which are 

Drosophila alcohol dehydrogenase tS S useSt be ^^2?*" *" ,amily '° b ° ctaaeW «' 

genases. Most membe^ of L S rf abouTS SSSS^S " l* 0 ™*"* ^ dehydr °- 

known to belong to this family are listed beloH^o^ 

- Acetoin dehydrogenase (EC 1 1 i s\ from Ktoh-ioit, aenyoro9enase ( EC iUD from insects such as Drosophila 
(BDHHEC^iJ^i^ 

phbBorphaB). -Glucose dehydrogenase ^C^^^t^ ll ^^^ UiK ^' , ^ {ouw 
11151) from Comomonas testosteroni - 2wiiSnSZlS h k „~ ^fc^* 0 ^'^ dehydrogenase (EC 
hydrogenans. - Ribito. dehydrogenase!^ ^ (ECjUJJtt) from Streptomyces 

genase (EC 1JJJ2) fromhuman. - Otoij^S^^^lSrS." ^ ^^^c- 
gno). - 3-oxoacyl-[acyl^amer protein] reductase [EC i i c^T 12 ? ™ G,uconob ^ter oxydans (gene 

Retino. dehydrogenase (EC 111 ?S) Z ZarZlr^ '"* n l E8chenchB «* <9«e fabG) and from plants. - 
Escherichia con and Ehih ct^SSaSZS " sS?^" (EC 1.1.1.125) from 

Escherichia coli (gene gu.D) an^rCSS^ 2-dehydrogenase (EcTTT^ from 

- 7-alpha-hydroxysteroid dehydrogenase fEC 1 1 1 i«;q\ r™Tc k ■ ( -1-1-146 ) (11 -DH) from mammals. 
12708 (gene baiA) and from C^^^^^S^^FT VR 
mals. - Tropinone reductase-l (EC 1 1 1 206) and -llffiC M T i ?<S , ^ ^ < EC ."-"<*> 
nase (EC VMJ33) from pLobtc^strain 141^ 

1AU50, from fungi. - Tetrahydroxynaphthalene reductase S^T^SSS^^ ,0 ™" 9) (EC 
ductase 1 (EC 1.1 1 253) foena PTm> fr™ t ^ • i " m Ma 9 na P°rtho 9nsea. - Pteridine re- 

<ec 1.1,, tror^dS^ 

(EC 1.3.1. ., , rom Acinetob^ c ^ dehydrogenase 
hydro-2,3-diol dehydrogenase (EC 1 3 1 -) taen o T" PU, ' da (96n9 Xy ' L) - " Bi P h enyl-2,3-di- 
hydrogenase (EC 1.3 1 ) from fteudomonS ^£ , ) T^'° US Pseudomo "aceae. - Cis-toluene dihydrodiol de- 
horn Pseudolnas Pu.i? g Tne bn^ ^ d ^-genase (EC 1.3.1.19) 
cherichia coli (gene en.A) and Bacillus ubU.'is ^SS^SSST ^"W™* (ZCTJ^l^. 
mammals. - Lignin degradation enzyme ItoD < Z TpseurtLS y P ,enC ""V educ,ase ^C^IZ) (HDHPR) from 
Agrobacterium plasmas (gene rrS? VeSiZrin r,TT , P™™ 0 *'^ ' A9ropine Syn,hesis reduc,as « from 
keto-acy. reductases fromLeptomyces ^Z tS^IZ "T? ^ ^ " Pu,ative 
epimerase from the peroxisomal JL^SS^^^^T^ V xr^' hydra ^^ydrogenase- 
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in the biosynthesis of D- atanyl-lipoteichoic acid. - Human follicular variant translocation protein 1 (FVT1). - Mouse 
adipocyte protein p27. - Mouse protein Ke 6. - Maize sex determination protein TASSELSEED 2. - Sarcophaga pere- 
grina 25 Kd development specific protein. - Drosophila fat body protein P6. - A Listeria monocytogenes hypothetical 
protein encoded in the internalins gene region. - Escherichia coli hypothetical protein yciK. - Escherichia coli hypothet- 
ical protein ydtG. - Escherichia coli hypothetical protein yjgl. - Escherichia coli hypothetical protein yjgU. - Escherichia 
coli hypothetical protein yohF. - Bacillus subtilis hypothetical protein yoxD. • Bacillus subtilis hypothetical protein ywf D. 

- Bacillus subtilis hypothetical protein ywfH. - Yeast hypothetical protein YIL124W. - Yeast hypothetical protein YIR035c. 

- Yeast hypothetical protein YIR036c. - Yeast hypothetical protein YKL055c. - Fission yeast hypothetical protein 
SpAC23D3.11. One of the best conserved regions which includes two perfectly conserved residues, a tyrosine and a 
lysine has been used as a signature pattern for this family of proteins. The tyrosine residue participates in the catalytic 
mechanism. 

Consensus pattern: [LIVSPADNK]-x(12)-Y-[PSTAGNCV]-(STAGNCX:iVMHSTAGC]- K- {PCHSAGFYR]-|LiVM- 
STAGD)-x(2)-[UVMFYW}-x(3)- [LIVMFYWGAPTHQ]- [GSACQRHM] [Y is an active site residue] 

[ 1] Joemvall H., Persson B., Krook M., Atrian S., Gonzalez-Ouarte R., Jeffery J., Ghosh D. Biochemistry 34: 
6003-6013(1995). 

[ 2] Villarroya A., Juan E., Egestad B.. Joemvall H. Eur. J. Biochem. 180:191-197(1989). 
[ 3] Persson B., Krook M., Joemvall H. Eur. J. Biochem. 200:537-543(1991). 

[ 4) Neidle E.L ( Hartnett C, Omston N.L, Bairoch A., Rekik M., Harayama S. Eur J. Biochem. 204:113-120(1992). 

[0238] 46. (adh_zinc) Zinc-containing alcohol dehydrogenases signatures Alcohol dehydrogenase (EC 1.1.1.1) 
(ADH) catalyzes the reversible oxidation of ethanol to acetaldehyde with the concomitant reduction of NAD [ 1 J. Currently 
three, structurally and catalytically. different types of alcohol dehydrogenases are known: - Zinc-containing ■long-chain' 
alcohol dehydrogenases. - Insect-type, or f short-chain* alcohol dehydrogenases. - Iron-containing alcohol dehydroge- 
nases.Zinc-containing ADH's [2,3] are dimeric or tetrameric enzymes that bind two atoms of zinc per subunit. One of 
the zinc atom is essential for catalytic activity while the other is not. Both zinc atoms are coordinated by either cysteine 
or histidine residues; the catalytic zinc is coordinated by two cysteines and one histidine. Zinc-containing ADH's are 
found in bacteria, mammals, plants, and in fungi, in most species there are more than one isozyme (for example, 
human have at least six isozymes, yeast have three, etc.). A number of other zinc-dependent dehydrogenases are 
closely related to zinc ADH [4], these are: - Xylitol dehydrogenase (EC 1.1.1.9 ) (D-xylulose reductase). - Sorbitol de- 
hydrogenase (EC 1.1.1.14 V - Aryl-ateohol dehydrogenase (EC 1.1.1.90) (benzyl alcohol dehydrogenase). - Threonine 
3-dehydrogenase (EC 1.1.1.103V - Cinnamyl-alcohol dehydrogenase (EC 1.1.1.195 ) (CAD) [5]. CAD is a plant enzyme 
involved in the biosynthesis of lignin. - Galactitol-1 -phosphate dehydrogenase (EC 1.1.1.251 ). - Pseudomonas putida 
5-exo-alcohol dehydrogenase (EC 1.1.1.-) [6]. - Escherichia coli starvation sensing protein rspB. - Escherichia coli 
hypothetical protein yjgB. - Escherichia coli hypothetical protein yjgV - Escherichia coli hypothetical protein yjjN. - Yeast 
hypothetical protein YAL060w (FUN49). - Yeast hypothetical protein YAL061W (FUN50). - Yeast hypothetical protein 
YCR105W. The pattern that has been devebped to detect this class of enzymes is based on a conserved region that 
includes a histidine residue which is the second ligand of the catalytic zinc atom. This family also includes NADP- 
dependent quinone oxidoreductase (EC 1.6.5.5V an enzyme lound in bacteria (gene qor), in yeast and in mammals 
where, in some species such as rodents, it has been recruited as an eye lens protein and is known as zeta-crystallin 
[7]. The sequence of quinone oxidoreductase is distantly related to that other zinc-containing alcohol dehydrogenases 
and it lacks the zinc-ligand residues. The torpedo fish and mammlian synaptic vesicle membrane protein vat-1 is related 
to opr. A specific pattern has been developed for this subfamily. 
Consensus pattern: G-H-E-x(2)-G-x(5)-[GA]-x(2)-[IVSAC] [H is a zinc ligand] 
Consensus pattern: [GSD]-[DEQH]-x(2)-L-x(3HSAI(2)-G-G-x-G-x(4)-Q-x(2)-[KR]- 

[ 1] Branden C.-L, Joemvall H., Eklund H. ( Furugren B. (In) The Enzymes (3rd edition) 11 :104-190(1975). 
[ 2) Joemvall H., Persson B., Jeffery J. Eur. J. Biochem. 167:195-201(1987). 
[ 3] Sun H.-W., Plapp B.V J. Mol. Evol. 34:522-535(1992). 

[ 4] Persson B., Hallborn J., Wattridsson M., Hahn-Haegerdal B., Keraenen S., Penttilae M., Joemvall H. FEBS 
Lett. 324:9-14(1993). 

[ 5] Knight M.E., Hatpin C, Schuch W. Plant Mol. Biol. 19:793-801(1992). 

[ 6] Koga H., Aramaki H., Yamaguchi E. t Takeuchi K. ( HoriuchiT, Gunsalus I.C. J. Bacteriol. 166:1089-1095(1986). 
( 7] Joemvall K, Persson B., Du Bois G., Lavers G.C., Chen J.H., Gonzalez P, Rao P.V., Zigler J.S. Jr. FEBS Lett. 
322:240-244(1993). 



[0239] 47. (aldedh) Aldehyde dehydrogenases active sites 

[0240] Aldehyde dehydrogenases (EC 1.2.1.3 and EC 1.2.1.5 ) are enzymes which oxidize a wide variety of aliphatic 
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^.andclass.Va^^ 

bacterial species. A number of enzymes are knoZ to £ f 8 3l$0 been se 9 u «*ed from fungal and 

' enzymes are listed betow. - P,an,sld KUZ£i£23^ ,0 ^ ^^-asesMbese 
catalyzes the las, step in the biosynthesis of b*£?E£2E^ <E ° [2]. an enzyme tha, 

phate dehydrogenase (EC 1219) - EscherfchtarnL ^ 

(EC1-1M-22) (gene aid) f4]. - Mammalian succinate ZuZl Eschenche col, lactaldehyde dehydrogenase 
» coii phenyiacetaldehyde dehydrogenase (EcTl2 1 ^T^E^he^^ !!• r*^ 1 ^^ - Escherichia 

iawehyde dehydrogenase (gene hpcC). - PsetSSL o^T^l^^^ 2 '^^^ s °"> 
^™PCandxy.G).anenzymeh,^ 

- Bacterial and mammalian methylmalonate-semialdehv^ J , deg rada, ™ 0, Phenols,cresols and catechol 
invoked in me dista. pathway of J^ST^SSZ: 2&&> H - enzyme 

a delta-1 -pyrroline- 5-carboxylate ^r^JSr^TJT ^ PUtA Protein ' ^ 
induced by dehydration of shoots {8] M^TantoSvH^? h f PM P, ° tein °' Unknown ,unc,i °" which is 
cytosolic enzyme responsible for, LmD^TJ^TZ^t^ 109 ^ (EC ^> W ™- «• * 
rahydrofoa.e. .» is an protein of about 900^*^^^"^ * into tet- 

residues) is structurally and functionally related ^S^^J^ H""* *" C " le ™ nal f 460 

- Yeast hypothec, protein YER073w. - Yeas, * PM YBR006w " 
protem F01F1.6.A glutemic acid and a cys.eine Ldue X^X. Sh' ^^ e, ° 9anS ^"etfcal 
aldehyde dehydrogenase. These residues are conseSdTafS 2^™^°^° aC,Mty ot ™alian 
derived for this family, one for each of the active^e SuL * * * ,h ' S tem * Two P attems havo been 

! ?! w T' l^ arper K - Lk,dahl R Bi «*emistry 28:1160-1167(1989) 
[0241] 48. Aldo/keto reductase family signatures 

ductase (EC VVK2). - Aldose reductase (EC 111 21 ^ -^tonTh\Jdroxv*t° ^V* '* " - 

term.nates androgen action by converting ^^L^^ V ^^ B ^ Mvdtt ^^ (EC 1.1.1.50) . which 
synthase (EC 1.1.1.-, 88 ) which c^^!2^^ MI09,erone l ° ^■Pha-andros.anediol. - Prosiaqbndin F 
Phate H2 and D2 to F2-a,pha. - r^oS^os 

^i ? Plasmid P MDH7.2 i^AyZ^^Z^T^S^^ ^ 
rdecone (kepone) to the corresponding alcohol - 2 S^ketoTnhJ^n 1 ! ? " h redUCes ,he pes,icide <*><>- 
the reduction of 2.5^ike«og.uconic acid to 2-ketc-L JSSSL h ^ (EC 1 ^ ^ cata ^* 
- NAD(P)H-de P endent xylose reductase (EC in ^^J^T"'??* ,he produc,ion d asco ">ic acid. 
^™*<«**b«™n^ ^.f^ 8 <° d — xylose into 

13^6) which catalyzes the reduction of de^^xj^ 

synthase in the formation of 4,2-,4--trihydro xychalcone F™ ™ i ^ . redUCtaSe> WhiCh C °- ac,s with 
function is no, known. - Leishrrana mafor I^SJS^^^, 1 ?; ^ " ^ GCY ^ whos * 

abundance is markedly elevated in promastigote oiSL clevetopmenta.ly regulated protein whose 
Escherichia co.i hypothetic protein ya,B - ischeShiTco? Zn,T < T *"* ' UnC ""° n is not vel k "°wn. - 
YBR149W. - Yeast hypothetical proteir^ Tasr^TT V9hE ' YeaSt "VPo^etical protein 

300 amino acid residues. Three consensuToatternl ho h y p0,he,,cal P ro,e '" YJR096w.These proteins have all about 
The firs, one is .oca.ed in the N-te^na >ZZ o ZZoTJZT^ T ™ ^ '° « hiS ^ - ^ 
The.hirdpa.tem,™^ 
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aldehydereductases. affect the catalytic efficiency. 

Consensus pattern: G-[FY]-R-[HSALHUVMF]-D-(STAGC]-[AS]-x(5)-E-x(2HLIVM]- G- 
Consensus pattern: [UVMFYJ-x(9HKREQ]-X"[UVM}-G4UVM}-[SC]-N-[FYl- 

Consensus pattern: [LIVMHPAIV]-{KRHST]-x(4)-R-x(2HGSTAEQKHNSL]-x(2)-[UVMFAJ [K is a putative active site 
residue]- 

[ 1] Bohren K.M., Bullock B.. Wermuth B., Gabbay K.H. J. Biol. Chem. 264:9547-9551(1989). 
[ 2] Bruce N.C., Willey D.L, Coulson A.F.W., Jeffery J. Biochem. J. 299:805-811(1994). 

[0242] 49. Alpha amylase. This family is classified as family 13 of the glycosyl hydrolases. The structure is an 8 
stranded alpha/beta barrel, interrupted by a -70 a.a calcium-binding domain protruding between beta strand 3 and 
alpha helix 3, and a carboxyMerminal Greek key beta-barrel domain. 

[1] Larson SB, Greenwood A, Cascio D. Day J, McPherson A, J MoJ Biol 1994;235:1560-1584. 
[0243] 50. Aminotransferases ctass-l pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyridoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1 ,2) into subfamilies. One of these, called class-l, currently consists of the following 
enzymes: - Aspartate aminotransferase (AAT) (EC 2.6.1.1 ). AAT catalyzes the reversible transfer of the amino group 
from L-aspartate to 2-oxogIutarate to form oxaloacetate and L-glutamate. In eukaryotes, there are two AAT isozymes: 
one is located in the mitochondrial matrix, the second is cytoplasmic. In prokaryotes, only one form of AAT is found 
(gene aspC). - Tyrosine aminotransferase (EC 2.6.1.5) which catalyzes the first step in tyrosine catabolism by reversibly 
transferring its amino group to 2- oxoglutarate to form 4-hydroxyphenylpyruvate and L-g!utamate. - Aromatic ami- 
notransferase (EC 2.6.1.57 ) involved in the synthesis of Phe, Tyr, Asp and Leu (gene tyrB). - 1 -aminocyclopropane- 

1- carboxylate synthase (EC 4.4.1.14 ) (ACC synthase) from plants. ACC synthase catalyzes the first step in ethylene 
biosynthesis. - Pseudomonas denitrificans cobC, which is involved in cobalamin biosynthesis. - Yeast hypothetical 
protein YJL060w.The sequence around the pyridoxal-phosphate attachment site of this class of enzyme is sufficiently 
conserved to allow the creation of a specific pattern. 

Consensus pattern: [GS]-[LI VMFYTAC]-fGSTA]-K-x(2HGSALVN]-[UVMFA)-x-[GNAR)- x-R-[LIVMA]-[GAJ [K is the py- 
ridoxal-P attachment site] 

[ 1) Bairoch A. Unpublished observations (1992). 

[ 2) Sung M.H., Tanizawa K., Tanaka H., Kuramitsu S., Kagamiyama H. ( Hirotsu K., Okamoto A., Higuchi T, Soda 
K. J. Biol. Chem. 266:2567-2572(1991). 

[0244] 51 . Aminotransferases class-ll pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyridoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1] into subfamilies. One of these, called class-ll. currently consists of the following 
enzymes: - Glycine acetyltransf erase (EC 2.3.1.29 ), which catalyzes the addition of acetyl-CoA to glycine to form 

2- amino-3-oxobutanoate (gene kbl). - 5-aminolevulinic acid synthase (EC 2.3.1.37 ) (delta-ALA synthase), which cat- 
alyzes the first step in heme biosynthesis via the Shemin (or C4) pathway, i.e. the addition of succinyl-CoA to glycine 
to form 5- aminolevulinate. - 8-amino-7-oxononanoate synthase (EC 2.3.1.47) (7-KAP synthetase), a bacterial enzyme 
(gene bioF) which catalyzes an intermediate step in the biosynthesis of biotin: the addition of 6-carboxy-hexanoyl-CoA 
to alanine to form 8-amino-7-oxononanoate. - Histtdinol-phosphate aminotransferase (EC 2.6.1.9 ). which catalyzes 
the eighth step in histidine biosynthetic pathway: the transfer of an amino group from 3-(imidazol-4-yl)-2-oxopropyl 
phosphate to glutamic acid to form hist idinol phosphate and 2-oxoglutarate. - Serine palmitoyltransferase (EC 2.3. 1.50 ) 
from yeast (genes LCB1 and LCB2), which catalyzes the condensation of palmitoyl-CoA and serine to form 3-ketosph- 
inganine.The sequence around the pyridoxal-phosphate attachment site of this class of enzyme is sufficiently con- 
served to allow the creation of a specific pattern 

Consensus pattern: T-[LIVMFYW]-[STAG]-K-(SAG]-[LIVMFYWR]-[SAG]-x(2)-[SAG] 

[K is the pyridoxal-P attachment site]- 

[ 1] Bairoch A. Unpublished observations (1991). 

[0245] 52. Aminotransferases class-Ill pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyridoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1,2] into subfamilies. One of these, called class-Ill, currently consists of the following 
enzymes: - Acetylornithine aminotransferase (EC 2.6.1.11 ) which catalyzes the transfer of an amino group from acety- 
lornithine to alpha-ketoglutarate, yielding N-acetyl-glutamic-5-semi-aldehyde and glutamic acid. - Ornithine ami- 
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notransferase (EC 2.6 1 13) which mtah,-,*** . 

yielding ^arnUl^^^ .«W <™ ornithine to aipha-fcetogfutera.e. 

which catalyzes transamination between a v^of^ ? m ^r naCK ^ u ^ aminotransferase (EC 2.6.1.18 1 
a pivotal rl in omega JZEET ' ^ ^ W™-*"**. 

nase). which catalyzes the transfer TZn^Z q^TEJ^.TT?'^ < EC26 " 9 > (GABA transami- 
dehyde and glutamic acid. - DAPA anZZ^J^E^^ lT? ^ SUCCinate M 
an intermediate step in the biosynthesis ^TlZTnJ^ 1 ?' , tV*™* (9e " e bioA > which 

7.8-diaminopefcrgLacH^ 

zyme (gene dgdA) that catalyzes the dee*rhnvvi Jh^L u ^ cam ^ x y ,ase (tc 4.1.1.64 ). a Pseudomonas cepacia en- 
ferase yhxA. - Bacillus subtilis aminoTnsfe^^ 

CaerKxrabditiselegansaminotransferaTJoimrp^ " Haemoph,lus aminotransferase HI0949. - 

ofthiscfcssoten^^^ 

S^st 3X^^,2^^ ^ a " d ^na, on the HMM search Ankyrin repeats 
order structure. P " P *' ° f SeCOnda v structures. The repeats associate to form a higher 



[1] A, HolakTA, FEBS Lett 1997;401:127-132. 

[2] Lux SE, John KM. Bennett V. Nature 1990;345:736-739. 



[0247] 54. Aminotransferases class-IV signature 

these various enzymes can be QmSSSSqSKSST SljE? reS,dU °-° n 1 ,he basis * "I"™. sirnLty. 
following enzymes: ' J suo,am,lles - ° ne <* these, called class-IV, currently consists of the 

" "~ic^^^ 

mate, to form leucine and 2-oxoglutaraT 9r ° UP ,rom 4 - me, "y'-2-oxopentanoate to gluta- 

group is known to be attached, in ilvE The re^nTa h^lf , ^ ly f" 9reS,dUe,OWhichlhe Py ridoxa| -P ho sP"ate- 
residues at the C-terminus side of the PlP^e " ^ 38 " S ' 9na,Ure ^ fe ,oca,ed s °™ 40 

Consensus pattern: ^TAGC*x<2> W ^ 

m Ba^h A^^t' ' NiCh °' S 8 P J BaC,eri °' ^4:5317-5323(1992, 
|2J Bairoch A. Unpublished observations (1992). 

iTii toi T 5,eraSeS Cl3SS - V P^^l-phosphate attachment site 

of phosphoserine and 2-oxoqlutara^ to 3 nh f inh? §:1S) ' enZyme Ca,alyzes ,he reversible ^conversion 
Phory lated pathway oSSlSSr' 6 ? 9lU,ama,e " " ** » ,he ™*» *~ 

simifcr to a rabbi, endometrfa, P^r^e^^ "O * highfy 

•erase B - Ser^oxy.ate aminotransferase (EC ^5^?^^ 
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torquens. - Serine-pyruvate aminotransferase (EC 2.6.1.51) . This enzyme also acts as an alanine-gtyoxylate ami- 
notransferase (EC 2.6.1.44 ). In vertebrates, it is located in the peroxisomes and/or mitochondria. - Isopenicillin N 
epimerase (gene cetD). This enzyme is involved in the biosynthesis of cephalosporin antibiotics and catalyzes the 
reversible isomerization of isopenicillin N and penicillin N. - NifS, a protein of the nitrogen fixation operon of some 

s bacteria and cyanobacteria. The exact function of nifS is not yet known. A highly similar protein has been found in fungi 
(gene NFS1 or SPL1). - The small subunit of cyanobacterial soluble hydrogenase (EC 1.12.-.-). - Hypothetical protein 
ycbU from Bacillus subtilis. - Hypothetical protein YFL030w from yeast. The sequence around the pyridoxal-phosphate 
attachment site of this class of enzyme is sufficiently conserved to albw the creation of a specific pattern. 
Consensus pattern: [UVFYCHTJ-[DGH]-[LIVMFYACJ-[UVMFYAJ-x(2)-IGSTACHGSTAJ- [HQR]-K-x(4,6)-G-x-[GSAT]- 

10 x-[UVMFYSAC] [K is the pyridoxal-P attachment site]- 

[ 1] Ouzounis C, Sander C. FEBS Lett. 322:159-164(1993). 
[ 2] Bairoch A. Unpublished observations (1992). 

[ 3] van der Zel A., Lam H.-M., Winkler M.E. Nucleic Acids Res. 17:8379*8379(1989). 

15 

[0251] 56. Annex ins repeated domain signature 

Annexins [1 to 6] are a group of calcium-binding proteins that associate reversibly with membranes. They bind to 
phospholipid bi layers in the presence of mcromotar free calcium concentration. The binding is specific for calcium and 
for acidic phospholipids. Annexins have been claimed to be involved in cytoskeletal interactions, phospholipase inhi- 

20 bition, intracellular signalling, anticoagulation, and membrane fusion. Each of these proteins consist of an N-terminal 
domain of variable length followed by four or eight copies of a conserved segment of sixty one residues. The repeat 
(sometimes known as an 'endonexin fold) consists of five alpha-helices that are wound into a right-handed superhelix 
[7].The proteins known to bebng to the annexin family are listed below: - Annexin I (Lipocortin 1) (Catpactin 2) (p35) 
(Chromobindin 9). - Annexin II (Lipocortin 2) (Calpactin 1) (Protein I) (p36) (Chromobindin 8). - Annexin III (Lipocortin 

25 3) (PAP-MI). - Annexin IV (Lipocortin 4) (Endonexin I) (Protein H) (Chromobindin 4). - Annexin V (Lipocortin 5) (Endon- 
exin 2) (VAC-alpha) (Anchorin Cll) (PAP-I). - Annexin VI (Lipocortin 6) (Protein 111) (Chromobindin 20) (p68) (p70). This 
is the only known annexin that contains 8 (instead of 4) repeats. - Annexin VII (Synexin). - Annexin VIII (vascular 
anticoagulant-beta) (VAC-beta). - Annexin IX from Drosophila. - Annexin X from Drosophila. - Annexin XI (Calcyclin- 
associated annexin) (CAP-50). - Annexin XII from Hydra vulgaris. - Annexin XIII (Intestine-specific annexin) (ISA).The 

30 signature pattern for this domain spans positions 9 to 61 of the repeatand includes the only perfectly conserved residue 
(an arginine in position 22)- 

Consensus pattern: [TGHSTVJ-x(8HLIVMF]-x(2)-R-x(3HDEQNH]-x(7HIFY]- x(7)-[LIVMF]-x(3HLIVMF]-x(11)- 
[LIVMFA]-x(2)-[LIVMF]- 

35 [1] Raynal P., Pollard H.B. Biochim. Biophys. Acta 1197:63-93(1994). 

[ 2] Barton G.J., Newman R.H., Freemont P.S., Crumpton M.J. Eur. J. Biochem. 198:749-760(1991). 
[ 3] Burgoyne R.D., Geisow M.J. Cell Calcium 10:1-10(1989). 

[ 4] Haigler H.T., Fitch J.M., Jones J.M., Schlaepfer D.D. Trends Biochem. Sci. 14:48-50(1989). 
[ 5] Klee C.B. Biochemistry 27:6645-6653(1988). 
<w [ 6) Smith P.O., Moss S.E. Trends Genet. 10:241-246(1994). 

[ 7] Huber R., Roemisch J., Paques E.-R EMBO J. 9:3867-3874(1990). 
[8] Fiedler K., Simons K. Trends Biochem. Sci. 20:177-178(1995). 

[0252] 57. (arf_1) ADP-ribosylation factors family signature 

4S ADP-ribosylation factors (ARF) [1 ,2,3,4) are 20 Kd GTP-binding proteins involved in protein trafficking. They may mod- 
ulate vesicle budding and uncoating within the Golgi apparatus. ARF's also act as allosteric activators of cholera toxin 
ADP-ribosyltransf erase activity. They are evolutionary conserved and present in all eukaryotes. At least six forms of 
ARF are present in mammals and three in budding yeast. The ARF family also includes proteins highly related to ARF's 
but which lack the cholera toxin cofactor activity, they are collectively known as ARL's (ARF-like).ARDI is a 64 Kd 

so mammalian protein of unknown biological function that contains an ARF domain at its C-terminal extremity. Proteins 
from the ARF family are generally included in the RAS 'superfamify* of small GTP-binding proteins [5], but they are 
only slightly related to the other RAS proteins. They also differ from RAS proteins in that they lack cysteine residues 
at their C -termini and are therefore not subject to prenylation. The ARFs are N -terminally myristoylated (the ARLs have 
not yet been shown to be modified in such a fashion). A conserved region in the C-terminal part of ARF's and ARL's 

55 has been selected as a signature pattern. 

Consensus pattern: |HRQT]-x-[FYWI]-x-[LIVM]-x(4)-A-x(2)-G-x(2)-[LIVM]-x(2)-[GSA]-[LIVMF]-x-[WK]-[LIVM]- 

Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A* (P-loop) (see <PDOC00017 



A7 
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[ 1] Boman A.L, Kahn R.A. Trends Biochem. Sci. 20:147-150(1995). 

[ 2] Moss J. , Vaughan M. Cell. Signal. 4.367-399(1 993). 

[ 3] Moss J., Vaughan M. Prog. Nucleic Acid Res. Mol. Biol. 45:47-65(1993). 

[4] Amor J.C., Harrison D.H., Kahn R.A, Ringe D. Nature 372:704-708(1994). 

[5] Valencia A., Chardin P., Wittinghofer A., Sander C. Biochemistry 30:4637-4648(1991). 

[0253] (arf_2) ATP/GTP-binding site motif A (P-loop) 

From sequence comparisons and crystallography data analysis it has been shown [1.2,3,4.5,61 that an appreciable 
proport.cn of proteins that bind ATP or GTP share a number of more or less conserved sequence nSfTm? best 
conserved ofthese motifs is a glycine-rich region, which typically forms a flexible loop between a beta-strand and an 

TJ^T^IL £ P ,n,eraC,S "* ° f ^ PhOSPhat6 9rOUpS ° f ,hS nUCleofcte - ™ s motifTge^aly 
1 1 1 „ , , consensus sequence [1 J or the 'P-loop' [SJ.There are numerous ATP- or GTP-binding pLinTh 
wh.chtheP-loop.sfound.Anumber of protein families for which the relevance of the presence of such S 
noted are hstedbelov^ 

SSocSSr G— T Pr °! einS ^ <PDOC00343>) - aruTdynamin-I pr""^ 

<PDOC00362 >^-Guanylate kinase (see <PDOC00670>). - Thymidine kinase (see <PDOC00524>) - Thymidilate 

SS^JLS kinaS8 ^ 9 1 PDOC00868 »- - NKrogenaselrl^^ 
(see <PDOC00580 >). - ATP-bindaig proteins evolved in "active transport" (ABC transporters) m fsee <PDOCooir>;->a 
^AandRNAheHcases^ 

of GTP-b,nd,ng proteins (Ras. Rho. Rab. Hal, Ypt1, SEC4, etc.). - Nuclear protein ran (see ^DOcSss S 
r.bosylat^actorsfam.ly(see<PDC«^>).-BacterialdnaAprotein 

siss^ e?r ri DN r r pro,e ;? ee <pdocoo539> >- - ^^sdi^sxs^ 

unrts (Gi, Gs. Gt, GO. etc.). - DNA mismatch repair proteins mutS family (See <PDOC00388>) - Bacterial tvoe II 
secretin system protein E (see <PDOC00567». No t all ATP- or GTP-binoing protel^slrTpled-upt ST A 

ofle P i°J£T nS TT ^ ti0n beCaUSe *° S,rUC,Ufe <* meir ATP " bindi "9 site is «4'*tey !Z*£Z I* 

protons the flexible loop exists n a slightly different form; this is the case for tubulins or protein kinases A SDecTal 

p^gTi^^^^^ 

Consensus pattern: [AG]-x(4)-G-K-[ST> 

[ 1] Walker J.E., Saraste M., Runswick M.J., Gay N.J. EMBO J V945-951M982) 
[ 2] Molier W. t Amons R. FEBS Lett. 186:1-7(1985). 

[ 3] Fry D.C., Kuby S.A., Mildvan A.S. Proc. Nat!. Acad Sci. U.S.A 83 907-911(1986) 

[ 4] Dever T.E., Gfynias M.J., Merrick W.C. Proc. Natl. Acad Sci. U.S.A. 84:1814-1818(1987) 

[ 5] Saraste M., Sibbald P.R., Wittinghofer A. Trends Biochem. Sci. 15 430-434(1990) 

[ 6] Koonin E.V J. Mol. Biol. 229:1165-1174(1993) 

5?i3S5S R ' *** S °" Mimmack MM " GNeadi u " m D " R : Ga,,a9her MR J Bk>ener9 * Biomembr - 22 - 

1 8] Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata) 

[l 9 is9 i ; C,erR> LaSk ° P ' AshbUmer M ' Ler °y P ' Nielsen RJ., Nrsh, K.. Schnier J. Slonrmski RR Nature 337:121 -122 
[10] Gorbalenya A.E., Koonin E.V., Donchenko A.P., Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 
[0254] 58. Arginase family signatures 

The following enzymes have been shown (1] to be evolutionary related' - Arainase /Frq^i\ a „w^,» 

wh,h cata^es ^ degradatjon of ^ ^ ^ ^ ^ ^ b = = e 

drolase), a prokaryotic enzyme (gene speB) that catalyzes the hydrolysis of agmatine int^nTscSaTd urea 
N foTZ 9 rr e , (EC — } < ,0 ™™^™<e Mrolase). a prokaryoSc enzyme (gene Z G) "at hyLyzes 
? ho T 9 6 ' n, ° 9 ' Utamate 3nd ,omiamide - - Hypothetical proteins from metLogenic archaeWterJ 
which are .nvoweo .n the binding of the two manganese ions [3] can be used as signature patterns - 
Consensus pattern: [L.VMF]-G-G-x-H-x-(L.VMT]-[STAVl-x-[PAG]-x ( 3)-[GSTA] [H binds r^nese.- 
Consensus pattern: [L.VM^-x-IL.VMF^-D^ASJ-H-x-D [The two D's and the H bind man^] ? 
Consensus pattern: [STHLIVMFY]-D-(L l V M ]-D-x(3)-|PAQ]-x(3)-P-[GSA].x(7)-G [The two £ bind manganese) 
[ 1] Ouzounis C, Kyrpides N.C. J. Mol. Evol. 39:101-104(1994) 

[ 2) Jenkinson CP., Grody W.W., Cederbaum S.D. Comp. Biochem. Physiol. 114B: 107-132(196). 
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[ 3] Kanyo Z.F., Scolnick LR. Ash D.E., Christianson D.W. Nature 383:554-557(1996). 
[0255] 59. (asp) Eukaryotic and viral aspartyl proteases active site 

Aspartyl proteases, also known as acid proteases, (EC 3.4.23.-) are a widely distributed family of proteolytic enzymes 
[1.2.3] known to exist invertebrates, fungi, plants, retroviruses and some plant viruses. Aspartate proteases of eukary- 
otes are monomeric enzymes which consist of two domains. Each domain contains an active site centered on a catalytic 
aspartyl residue.The two domains most probably evolved from the duplication of an ancestral gene encoding a primor- 
dial domain. Currently known eukaryotic aspartyl proteases are: - Vertebrate gastric pepsins A and C (also known as 
gastricsin). - Vertebrate chymosin (rennin), involved in digestion and used for making cheese. - Vertebrate lysosomal 
cathepsins D (EC 3.4.23.5 ) and E (EC 3.4.23.34 V - Mammalian renin (EC 3.4.23.15) whose function is to generate 
angiotensin I from angiotensinogen in the plasma. - Fungal proteases such as aspergillopepsin A (EC 3.4.23.18 ). 
candidapepsin (EC 3.4.23.24 ). mucoropepsin (EC 3.4.23.23) (mucor renr\)n) t endothiapepsin (EC 3.4.23.22 ). polypi 
ropepsin (EC 3.4.23.29) . and rhizopuspepsin (EC 3.4.23.21 ). - Yeast saccharopepsin (EC 3.4.23.25 ) (proteinase A) 
(gene PEP4). PEP4 is implicated in posttranslational regulation of vacuolar hydrolases. - Yeast barrier pepsin (EC 
3.4.23.35) (gene BAR 1 ); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating pheromone. 
- Fission yeast sxal which is involved in degrading or processing the mating pheromones. Most retroviruses and some 
plant viruses, such as badnaviruses, encode for anaspartyl protease which is an homodimer of a chain of about 95 to 
1 25 amino acids. In most retroviruses, the protease is encoded as a segment of apoprotein which is cleaved during 
the maturation process of the virus. It is generally part of the pol polyprotein and, more rarely, of the gagpolyprotein. 
Conservation of the sequence around the two aspartates of eukaryotic aspartyl proteases and around the single active 
site of the viral proteases allows us to develop a single signature pattern for both groups of protease. 
Consensus pattern: [L1VMFGAC]-[LIVMTADN]-[LIVFSA]-D-IST]-G-[STAV]-[STAPDENQ]- x-[LIVMFSTNC]-x-[UVMF- 
GTA] [D is the active site residue] Note: these proteins belong to families A1 and A2 in the classification of peptidases 
[4.E1 

[ 1] Foltmann B. Essays Biochem. 17:52-84(1981). 

[ 2] Davies D.R. Annu. Rev. Biophys. Chem. 19:189-215(1990). 

[ 3] Rao J.K.M., Ericksc.i J.W., Wlodawer A. Biochemistry 30:4663-4671(1991). 

[ 4] Rawlings N.D., Barrett A.J. Meth. EnzymoL 248:105-120(1995). 

[0256] 60. (BIRA) Biotin repressor 

[1] Wilson KP, Shewchuk LM. Brennan RG, Otsuka AJ, Matthews BW; Proc Natl Acad Sci USA 1992 89 9257-9261 
[0257] 61. BTB/POZ domain 

The BTB (for BR-C, ttk and bab) [1] or POZ (for Pox virus and Zinc finger)[2] domain is present near the N-terminus 
of a fraction of zinc finger 

(zf-C2H2 ) proteins and in proteins that contain the Ketch motif 

such as Kelch and a family of pox virus proteins. The BTB/POZ domain mediates homomeric dimerisation and in some 
instances heteromeric dimerisation [2].The structure of the dimerised PLZF BTB/POZ domain has been solved and 
consists of a tightly intertwined homodimer. The central scaffolding of the protein is made up of a cluster of alpha- 
helices flanked by short beta-sheets at both the top and bottom of the molecule [3], POZ domains from several zinc 
finger proteins have been shown to mediate transcriptional repression and to interact with components of histone 
deacetylase co-repressor complexes including N-CoR and SMRT [4,5,6). The POZ or BTB domain is also known as 
BR-C/Ttk or ZiN 

[1] Zollman S, Godt D. Prive GG, Couderc JL, Laski FA; Proc Natl Acad Sci U S A 1994;91:10717-10721 
[2]Bardwell VJ, Treisman R; Genes Dev 1994;8:1664-1677. 

[3] Ahmad KF, Engel CK, Prive GG; Proc Natl Acad Sci U S A 1998;95:12123-12128. 

[4] Deweindt C, Albagli O. Bernardin F, Dhordain P. Quief S, Lantoine D. Kerckaert JR Leprince D; Cell Growth 
Differ 1995;6:1495-1503. 

[5] Huynh KD, Bardwell VJ; Oncogene 1998;17:2473-2484. 

[6) Wong CW, Privalsky ML; J Biol Chem 1998;273:27695-27702. 

[0258] 62. (Bac GSPproteins) Bacterial type II secretion system protein D signature 

A number of bacterial proteins, some of which are involved in a general secretion pathway (GSP) for the export of 
proteins (also called the type II pathway) [1 to 5], have been found to be evolutionary related. These proteins are listed 
below: - The t>' protein from the GSP operon of: Aeromonas (gene exeD); Erwinia (gene outD); Escherichia coli (gene 
yheF), Klebsiella pneumoniae (gene pulD); Pseudomonas aeruginosa (gene xcpQ); Vibrio cholerae (gene epsD) and 
Xanthomonas campestris (gene xpsD). - comE from Haemophilus influenzae, involved in competence (DNA uptake). 



AO 
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- piiQ from Pseudomonas aeruginosa, which is essential for the formation of the pili. - hofQ (hopQ) from Escherichia 
co!L - hrpH from Pseudomonas syringae, which is involved in the secretion of a proteinaceous elicitor of the hypersen- 
sitivity response in plants. - hrpA1 from Xanthomonas campestris pv. vesicatoria, which is also involved in the hyper- 
sensitivity response. - mxiD from Shigella flexneri which is involved in the secretion of the Ipa invasins which are 
necessary for penetration of intestinal epithelial cells. - omc from Neisseria gonorrhoeae. - yssC from Yersinia entero- 
colitis virulence plasmid pYV, which seems to be required for the export of the Yop virulence proteins. - The gpIV 
protein from filamentous phages such as fl, ike, or m13. GpIV is said to be involved in phage assembly and morpho- 
genesis. These proteins all seem to start with a signal sequence and are thought to be integral proteins in the outer 
membrane. As a signature pattern a conserved region in the C-terminal section of these proteins has been selected 
Consensus pattern: [GRHDEQKGHSTVMHLI^ 

[GSAE]-x-[LIVM]-P-[LIVMFYW](2)-x(2HLVJ-F j n-iviyirj 

[ 1] SalmondG.P.C., Reeves P.J. Trends Biochem. Sci. 18:7-12(1993). 

[ 2] Reeves RJ., Whitcombe D., Wharam S., Gibson M., Allison G., Bunce N., Barallon R., Douglas P., Mulhoiland 

V., Stevens S., Walker S., Salmond G.P.C. MoL Microbiol. 8:443-456(1993). 

[ 3] Martin PR., Hobbs M„ Free P.D., Jeske Y., Mattick J.S. Mol. Microbiol. 9:857-868(1 993). 

[ 4] Hobbs M., Mattick J.S. Mot. Microbiol. 10:233-243(1993). 

[5] Genin S. ( Boucher C.A, MoL Gen. Genet. 243:112-118(1994). 

[0259] 63. (Bac globin) Protozoan/cyanobacterial globins signature 

Gfobins are heme-containing proteins involved in binding and/or transporting oxygen [1]. Almost all globins belong to 
a large family (see <PDOC00793>), the only exceptions are the following proteins which form a family of their own 
[2,3]: - Monomenc hemoglobins from the protozoan Paramecium caudatum, Tetrabymena pyriformis and Tetrahymena 
thermophila. - Cyanoglobin from the cyanobacteria Nostoc commune. - Globins LI637 and LI410 from the chloroplast 
of the alga Chlamydomonas eugametos. - Mycobacterium tuberculosis hypothetical protein MtCY48 23 These proteins 
contain a conserved histidine which could be involved in heme-binding. As a signature pattern, a conserved region 
that ends with this residue was used 

Consensus pattern: F-[LF]-x(5)-G-[PA]-x(4)-G-[KRA]-x-[LIVMpx(3)-H- 

[ 1 J Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New-York (1 988) 
[ 2] Takagi T. Curr. Opin. Struct. Biol. 3:413-418(1993). 

[ 3] Couture M., Chamberland H., St-Pierre B., Lafontaine J., Guertin M.; Mol. Gen. Genet. 243:185-197(1994). 
[0260] 64. Band 7 protein family signature 

Mammalian band 7 protein [1] (also known as 7.2B or stomatin) is an integral membrane phosphoprotein of red blood 
cells thought to regulate cation conductance by interacting with other proteins of the junctional complex of the mem- 
brane skeleton. Structurally, band 7 is evolutionary related to the following proteins: - Caenorhabditis elegans protein 
mec-2 [2]. Mec-2 positively regulates the activity of the putative mechanosensory transduction channel It may links 
the mechanosensory channel and the microtubule cytoskeleton of the touch receptor neurons. - Caenorhabditis elegans 
proteins sto-1 to sto-4. - Caenorhabditis elegans protein unc-1. - Escherichia coli hypothetical protein ybbK - Myco- 
bacterial tuberculosis hypothetical protein MtC Y277.09. - Synechocystis strain PCC 6803 hypothetical protein sir 11 28 
- Methanococcus jannaschii hypothetical protein MJ0827. Structurally all these proteins consist of a short N4erminal 
domain which is followed by a transmembrane region and a variable size (from 1 70 to 350residues) C-terminal domain 
As a signature pattern, a conserved region located about 110residues after the transmembrane domain was selected 
Consensus pattern: R-x(2)-[LIV]-[SAN]-x(6)-[LIV)-D-x(2)-T-x(2)-W-G-[LIV]- [KRH]-[LTV]-x-[KR]-[LIV]-E-[UV]-[KR]- 

( 1 J Gallagher P.G., Forget B.G. J. Biol. Chem. 270:26358-26363(19951 
[ 2] Huang M., Gu G., Ferguson E.L, Chalfie M. Nature 378:292-295(1995). 

[0261] 65. Barwin domain signatures 

Barwin [1] is a barley seed protein of 125 residues that binds weakly a chitinanalog. It contains six cysteines involved 
in disulfide bonds, as shown in the following schematic representation. 
+ + ,*~.* r ~* 

xxxxxxxxxxxxxxxCraxxxxxxxxCxxxxC 

++ vc . : conserved cysteine involved in a disulfide bond."*: position of the patterns Barwin 

is closely related to the following proteins: - Hevein, a wound-induced protein found in the latex of rubber trees - HEL 
an Arabidopsis thabana hevein-like protein [2]. - Winl and win2, two wound-induced proteins from potato - Pathogen- 
esrs-related protein 4 from tobacco. Hevein and the win1/2 proteins consist of an N-terminal chitin-binding domain 
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followed by a barwin-like C-terminal domain. Barwtn and its related proteins could be involved in a defense mechanism 
in plants. As signature patterns, two highly conserved regions that contain some of the cysteines were selected 
Consensus pattern: C-G-[KR]-C-L-x-V-x-N [The two C's are involved in disulfide bonds]- 
Consensus pattern: V-[DN]-Y-[EQ]-F-V-[DN]-C [C is involved in a disulfide bond}- 

1 1) Svensson B., Svendsen I., Hoejrup P., Roepstorff R, Ludvigsen S., Poulsen RM. Biochemistry 31:8767-8770 
(1992). 

[ 2] Potter S., Uknes S., Lawton K. f Winter AM, Chandler D., Dimaio J., Novitzky R., Ward E., Ryals J. Mol. Plant 
Microbe interact. 6:680-685(1993). 

[0262] 66. (Bowman-Birk leg) Bowman-Birk serine protease inhibitors family signature 

PROSITE cross-reference(s). The Bowman-Birk inhibitor family (1] is one of the numerous families of serine proteinase 
inhibitors. As it can be seen in the schematic representation, they have a duplicated structure and generally possess 
two distinct inhibitory sites: 



H + 

| + + + + + + | 

I I I I I I I I 

xxCCxxCxxCxx#xxCxxCxxxxCxxxCxxxCxxxxCxx#xxCxxCxxCxxCxx 

II I I II 

+_| + + + | 

+ + 

< 70 residues > 

'C: conserved cysteine involved in a disulfide bond. 
'#': active site residue. 
'*': position of the pattern. 

[0263] These inhibitors are found in the seeds of all leguminous plants as well as in cereal grains. In cereals they 
exist in two forms, one of which is a duplication of the basic structure shown above [2). The pattern that was developed 
to pick up sequences belonging to this family of inhibitors is in the central part of the domain and includes four cysteines. 
[0264] Consensus pattern C-x(5,6)-[DENQKRHSTA]-C-[PASTDH]-[PASTDK]-[ASTDV]-C-[NDKSHDEKRHSTA]-C 
[The four C's are involved in disulfide bonds] Note this pattern can be found twice in some duplicated cereal inhibitors. 

[ 1] Laskowski M., Kato I. Anna Rev. Biochem. 49:593-626(1980). 

[ 2] Tashiro M., Hashino K., Shiozaki M., Ibuki R, Maki Z. J. Biochem. 102:297-306(1987). 

[0265] 67. Pathogenesis-related protein Bet v I family signature 

[0266] A number of plant proteins, which all seem to be involved in pathogen defense response, are structurally 
related [1,2,3]. These proteins are: 

Bet v I, the major pollen allergen from white birch. Bet v I is the main cause of type I allergic reactions in Europe, 

North America and USSR. 

Aln g I, the major pollen allergen from alder. 

Api G I, the major allergen from celery. 

Car b I, the major pollen allergen from hornbeam. 

Cor a I, the major pollen allergen from hazel. 

Mai d I, the major pollen allergen from apple. 

Asparagus wound-induced protein AoPR1. 

Kidney bean pathogenesis-related proteins 1 and 2. 

Parsley pathogenesis-related proteins PR1-1 and PR1-3. 

Pea disease resistance response proteins p!49, pi 176 and DRRG49-C. 



R1 
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Pea abscisic acid-responsive proteins ABR17 and ABR18 
Potato pathogenesis-related proteins STH-2 and STH-21 " 
Soybean stress-induced protein SAM22. 



fIJBreitenederH., PeftenburgerK, BitoA. valantaR ^ ra «r» a 

J. 8:1935-1938(1989). valenta R., KraftD.. RumpoldH., ScheinerO., Breitenbach M. EMBO 

S2S?; J f*2 M E ' RUSSe " D " AmaSin ° R M - Plant »** Biol. 18-459-466(19921 
[3] Warner S.A.J.. Scott R.. Draper J. Plant Mol. Biol. 19:555-561(1992) ( )- 

[0268] | 68. bZIP transcription factors basic domain signature 

iszs^szssz sssffisss.'r 9roups ,o9ether — - «* * 

family is quite large, therefore onfy a panta. I^of S eo^S.l! k ** Ths 

AP-1. which binds selectively to enhancer Je^^t^TT , her£> " " ^ription factor 

also known as c-Jun. is the iular ™ ■ 

D. probable transcription factors which are highly similar to iunS- ThI £ n J? ° V1Ua " Jun " B and i un " 

non^ovalent dimer with c-jun. - The fos-related P rote^7frS^ 

binding proteins CREB, CREM, ATF-1 , ATF^3 ATF-4 ATF 5 WFfi -Mammalian cAMP response element (CRE) 
transcriptional activator involved in the regulation of the productVon of S , ° PaqUe Z 3 ,rans ^ting 

G-box binding factors GBF1 to GBF4, Parsle CPRF 1 TcW3 Z P tTc dUm9 endos P e ™ " Arabidopsis 
proteins bind the G-box promoter element oTnWJm oenT ^t-,™ *« *«*t EMBP-1. All these 
expression of both the kruppel and kn J ^ LornSicr fl? n " D ?«* Ma Gian, • re P re *** the 

transcriptional activator tha? binds to7a E££2 SJSiT^S?^ ** 8 *** 2 (BBF ' 2 )- a 

Drosophila segmentation protein cap^o^ae™*^^ and protein genes. - 

elegans skn-1, a developmenta. protein ^ZT^lTTvenZT^ " ^ T* 09 ""* ' ^"^bditis 
transcriptbnfactor, acomponentof the general control svs IS blastomeres n the early embryo. - Yeast GCN4 
enzymes in response to amino acid staS aTdtS^ 

cys-3 which turns on the expression ZZ^S^S??^™ ^ ^ pro,eia " N «^ospora crassa 



Pyruvate carboxylase (EC 6.4.1.1). 
Acetyl-CoA carboxylase (EC 6.4.1.2). 
Propionyl-CoA carboxylase (EC 6.4.1.3) 
Methylcrotonoyl-CoA carboxylase (EC 6 4 14) 
Geranoyl-CoA carboxylase (EC 6.4.1.5). 
Urea carboxylase (EC 6.3.4.6). 
Oxaloacetate decarboxylase (EC 4.1.1 3) 
Methylmalonyl-CoA decarboxylase (EC 4 1 1 41 ) 
Glutaconyl-CoA decarboxylase (EC 4. 1 . 1 .70) 



the lipoyl-bind^g tysine residue of 2-oxo ac« ' S * ** —d 

J 1] Knowles J.R. Annu. Rev. Biochem. 58:195-221(1989) 
[ 2] Samols D., Thronton C.G., Murlif VL Kumar r if u 

Munfl V.L.. Kumar G.K., Haase F.C., Wood H.G. J. Biol. Chem. 263:6461-6464 
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(1988). 

[ 3] Goss N.K, Wood KG. Meth. EnzymoL 107:261-278(1984). 

[ 4] Shenoy B.C., Xie Y, Park V.L, Kumar G.K., Beegen K, Wood KG.. Samols D. J. Biol. Chem. 267:18407-18412 
(1992). 

[0271] 2-oxo acid dehydrogenases acyltransferase component lipoyl binding site 

The 2-oxo acid dehydrogenase multienzyme complexes [1 ,2] from bacterial and eukaryotic sources catalyze the oxi- 
dative decarboxylation of 2-oxo acids to the corresponding acyl-CoA. The three members of this family of multienzyme 
complexes are: 

Pyruvate dehydrogenase complex (PDC). 
2-oxoglutarate dehydrogenase complex (OGDC). 
Branched-chain 2-oxo acid dehydrogenase complex (BCOADC). 

These three complexes share a common architecture: they are composed of multiple copies of three component en- 
zymes - E1 , E2 and E3. E1 is a thiamine pyrophosphate-dependent 2-oxo acid dehydrogenase, E2 a dihydrolipamide 
acyltransf erase, and E3 an FAD-containing dihydrolipamide dehydrogenase. 

[0272] E2 acyltransf erases have an essential cof actor, lipoic acid, which is covalentfy bound via a amide linkage to 
a lysine group. The E2 components of OGCD and BCOACD bind a single lipoyl group, while those of PDC bind either 
one (in yeast and in Bacillus), two (in mammals), or three (in Azotobacter and in Escherichia coli) lipoyl groups [3}. 
In addition to the E2 components of the three enzymatic complexes described above, a lipoic acid cofactor is also 
found in the following proteins: 

- H-protein of the glycine cleavage system (GCS) [4J. GCS is a multienzyme complex of four protein components, 
which catalyzes the degradation of glycine. H protein shuttles the methylamine group of glycine from the P protein 
to the T protein. H-protein from either prokaryotes or eukaryotes binds a single lipoic group. 

Mammalian and yeast pyruvate dehydrogenase complexes differ from that of other sources, in that they contain, 

in small amounts, a protein of unknown function - designated protein X or component X. Its sequence is closery 

related to that of E2 subunits and seems to bind a lipoic group [5]. 

Fast migrating protein (FMP) (gene acoC) from Alcaligenes eutrophus [6J. 

This protein is most probably a dihydrolipamide acyltransf erase involved in acetoin metabolism. 

A signature pattern was developed which allows the detection of the lipoyl-binding site. 
[0273] Consensus pattem[GN)-x(2)-[LIVF]-x(5^ 

(5)-[GCN]-x-[LIVMFY] [K is the lipoyl-binding site] Note the domain around the lipoyl-binding lysine residue is evolu- 
tionary related to that around the biotin-binding lysine residue of biotin requiring enzymes 

[ 1] Yeaman S.J. Biochem. J. 257:625-632(1989). 

( 2] Yeaman S.J. Trends Biochem. ScL 11:293-296(1986). 

[ 3] Russel G.C., Guest J.R. Biochim. Biophys. Acta 1076:225-232(1991). 

[ 4] Fujiwara K., Okamura-lkeda K., Motokawa Y J. Biol. Chem. 261:8836-8841(1986). 

[ 5] Behal R.K, Browning K.S., Hall T.B., Reed L.J. Proc. Nat!. Acad. ScL U.S.A. 86:8732-8736(1989). 

[ 6] Priefert H., Hein S. t Krueger N., Zeh K., Schmidt B., Steinbuechel A. J. Bacterid. 173:4056-4071(1991). 

[0274] 70. C2 (C2 domain) Number of members: 295 

Some isozymes of protein kinase C (PKC) [1,2] contain a domain, known as C2, of about 116 amino-acid residues 
which is located between the two copies of the C1 domain (that bind phorbol esters and diacylglycerol) (see 
<PDOC00379>) and the protein kinase catalytic domain (see <PDOC00100>). Regions with significant homology 
[3.E1] to the C2-domain have been found in the following proteins: 

PKC isoforms alpha, beta and gamma and Drosophiia isoforms PKC1 and PKC2. 

PKC isoforms delta, epsilon and eta, Caenorhabditis elegans kin-13 and yeast PKC1 have a C2-like domain at 
the N-terminal extremity [4]. 

Yeast cAMP dependent protein kinase SCH9 contains a C2-like domain. 

- Mammalian phosphatidylinositol-specific phospholipase C (PI-PLC) (see <PDOC50007>) isoforms beta, gamma 
and delta as well as several non-mammalian PI-PLCs have a C2-like domain C-terminal of the catalytic domain. 
Mammalian and plants phosphatidytinositol-3-kinase have a C2-like domain in the central region of the 110 Kd 
catalytic subunit. 



EP 1 033 405 A2 



Rabphilin-3A. a synaptic protein conta^o C2^aTns ^ " ** G ^ aM resion - 

contain a C2 domain. * ** C^™" 8 - » » *• only extracellular protein known to 



Yeast hypotheticat protein YML072C has a C2 domain 
Yeast hypothetical protein YNL087W has three C2 domains 



_ ' 7'. 1 " "as inree uz domains. 

" ******* P ro,ei " "7A4.7 has two C2 domains 

e. 9 . binding to inositoM^AS-te^Xpn^e^vTbeen STLSST ST*" ** ,he 02 lik « 

[0275] [1 JMedline: 96367095 Extending the C2 domain family C2s in PKr* h.» 3 , 

.pases, GAPS and perforin. Ponting CP. Parker PJ; P^oteiSl 9^5 1 See ^ ' *" and,h ° ,a ' PhOSph °'- 

! 2! S^S^T* °- EUr J - BiOChem - 208:547-557(1992). 

[ 2] Stabel S. Semin. Cancer Biol. 5:277-284(1 994) 

! 3 S, Ws°S h nn K rt °, Y - Suedhof T.C. J. Biol. Chem. 270:25273-25280(1995) 
[0276] 71. CAP (CAP protein) Number of members 11 

the C-term«al domain is tess cleaTbut It ZuZ^M^TtT^ SUCh * S **■ The ,unct ™ °' 

conserved , higher eukaryot, organic ZZTJ^^^S^ ^ ^ '* ^ * 

terminal region have been developed ^ N-"*™ 3 ' eX,remit V and 1,16 to a C- 

- Consensus pattern: [LIVM](2)-x-R-L-[DE]-x(4)-R-L-E 

- Consensus pattern: D-[L!VMFYJ-x-E-x-[PAJ-x-P-E-Q-[LIVMFY]-K 

f1JKaw ? mukai M .,GerstJ..FieldJ.,Riggs M ..RodgersL..W^^^ 

[ 2) Yu G.. Sw.ston J., Young D. J. Cell Sci. 107:1671-1678(1994). 3 167-1 D 0(1992). 

[0277] 72. CAP_GLY (CAP-Gly domain) 

CAP stands for cytoskeleton-associated proteins Swiss P-iQcm m=>„ k« - u . 
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[0278] tt has been shown [1 ] that some cytoskeleton-associated proteins (CAP) share the presence ol a conserved, 
glycine-nch domain of about 42 residues, called here CAP-Gly. Proteins known to contain this domain are listed below. 

Restin (also known as cytoplasmic linker protein-170 or CLIP-170), a 160 Kd protein associated with intermediate 
filaments and that links endocytic vesicles to microtubules. Restin contains two copies of the CAP-Gry domain. 
Vertebrate dynactin (150 Kd dynein-associated polypeptide; DAP) and Drosophila glued, a major component of 
activator I, a 20S polypeptide complex that stimulates dynein-mediated vesicle transport 

- Yeast protein BIK1 which seems to be required for the formation or stabilization of microtubules during mitosis and 
for spindle pole body fusion during conjugation. 

- Yeast protein N1P100 (N1P80). 

- Human protein CKAP1/TFCB, Schizosaccharomyces pombe protein alp11 and Caenorhabditis elegans hypothet- 
ical protein F53F4.3. These proteins contain a N-terminal ubiquitin domain (see <PDOC00271>) and a C-terminal 
CAP-Gly domain. 

Caenorhabditis elegans hypothetical protein M01A8.2. 
Yeast hypothetical protein YNL148c. 

Structurally, these proteins are made of three distinct parts: an N-terminal section that is most probably globular and 
contains the CAP-Gry domain, a large central region predicted to be in an alpha-helical coiled-coil conformation and, 
finally, a short C-terminal globular domain. The signature for the CAP-Gly domain corresponds to the first 32 residues 
of the domain and includes five of the six conserved glycines. 

- Consensus pattern: G-x(8, 1 0)-[F YW]-x-G-[LI VM]-x-[LI VMFY]-x(4)-G-K-[NH]-x-G-[STAR]-x(2)-G-x(2)-[LY]-F 

[ 1] Riehemann K., Sorg C. Trends Biochem. Sci. 18:82-83(1993). 
[0279] 73. (CBD1) 
Cellulose-binding domain, fungal type 

The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
3.2.1.4), cellobiohydrolases (EC 3.2.1.91) (exoglucanases), or xylanases (EC 3.2.1.8) [1]. 

[0280] Structurally, cellulases and xylanases generally consist of a catalytic domain joined to a cellulose-binding 
domain (CBD) by a short linker sequence rich in proline and/or hydroxy-amino acids. 

[0281] The CBD of a number of fungal cellulases has been shown to consist of 36 amino acid residues. Enzymes 
known to contain such a domain are: 

Endoglucanase I (gene eg!1 ) from Trichoderma reesei. 
Endoglucanase It (gene egI2) from Trichoderma reesei. 
Endoglucanase V (gene egl5) from Trichoderma reesei. 

Exoceltobiohydrolase I (gene CBHI) from Humicola grisea, Neurospora crassa, Phanerochaete chrysosporiurr 
Trichoderma ree<ei, and Trichoderma viride. 
Exoceltobiohydrolase II (gene CBHII) from Trichoderma reesei. 
Exoceltobiohydrolase 3 (gene cel3) from Agaric us bisporus 
Endoglucanases B, C2, F and K from Fusarium oxysporum. 

[0282] The CBD domain is found either at the N-terminal (Cbh-ll or egl2) or at the C-terminal extremity (Cbh-I, eg!1 
or eg1 5) of these enzymes. As it is shown in the following schematic representation, there are four conserved cysteines 
in this type of CBD domain, all involved in disulfide bonds. 

+ + 

I + 

I I I I 

xxxxxxxCxxxxxxxxxxCxxxxxCxxxxxxxxxCx 

'C: conserved cysteine involved in a disulfide bond. 
'*': position of the pattern. 

[0283] Such a domain has also been found in a putative polysaccharide binding protein from the red alga, Porphyra 
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I SKS^iST D Mi ' ter W — R-A.J. Mfcrobiol. Rev. SS:303-315 (1991 , 

cys^ine-beta-synthase (CBS) ^Zl^^Z^ZTZl St!? *""*■ ~ ^ h 
monophosphate dehydrogenase from all species. howeveJthe^BS dZlI? ^ ^f*" 8 3fe '° Und h inosine - 
domains are found in intracellular loops of severa cS^J£,2f £TT ™ "* " eeded for ac,ivi, y- T "° CBS 
to homocystinuria. Number of members: 4 14 m this *"**» <* Swiss:P35520 lead 

PR, Murcko MA. Witeon KP; Cell 1 9» SSvJi ' ^ SA " ChamberS SR Ca ™ 

Discovery of CBS domain 

enzymes are: pnospnor.de anhydnde bond share a conserved sequence region [1 ,2J. These 

- Ethanolaminephosphotransferase (EC 2.7.8. 1 ) from yeast (oene EPT1 \ 

- Diacylglycerol cholinephosphotransferase (EC 2.7.8.2)^rom yTasf (qene CPTH 

- Phosphatidylglycerophosphate synthase f EC 2 7 a <;> rnoJ . T ] ' 

transferase) from baLria (gene pgsA) < CDP ^'^'9'ycerol-glycerol-3-phosph a te 3-phosphatidy.- 

" Z ^^^^^^^^^ O-phosphatidynransferase, from yeas, 
" ZSXS""™ Syn,haSe (EC 2 7 8 (CDP-diacy.glyceroMnosito, 3-phosphat^ransferase, from yeas, 

- Consensus pattern: D-G-x(2)-A-R-x(8)-G-x(3)-D-x(3)-D 

1 1 JMedline: 97075020 Two-dimensional 1 h wmd „i . 
erophosphate synthase in n^sTe^ 

Biochem 1996;241:489-497. "auKsson JB, Rilfors L, Arvidson G. Lindblom G; Eur J 

[ 1] Nikawa J.-I., Kodaki T., Yamashita S. 
J. Bfo!. Chem. 262:4876-4881 (1 987). 
[ 2] Hjelmstad R.H., Bel! R.M. 
J. Biol. Chem. 266:5094-5134(1991). 
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32:11507-11515. 

[0289] The tollowing FAD flavoproteins oxkJoreductases have been found [1,2] to be evolutionary related. These 
enzymes, which are called 'GMC oxidoreductases*. are listed below. 

- Glucose oxidase (EC 1.1.3.4) (GOX) from Aspergillus niger. Reaction catalyzed: glucose + oxygen -> delta-lu- 
conolactone + hydrogen peroxide. 

- Methanol oxidase (EC 1.1.3.13) (MOX) from fungi. Reaction catalyzed: methanol + oxygen -> acetaldehyde + 
hydrogen peroxide. 

- Choline dehydrogenase (EC 1.1.99.1) (CHD) from bacteria Reaction catalyzed: choline + unknown acceptor -> 
betaine acetaldehyde + reduced acceptor. 

- Glucose dehydrogenase (GLD) (EC 1 .1 .99. 10) from Drosophila Reaction catalyzed: glucose + unknown acceptor 
-> delta-gluconolactone + reduced acceptor. 

Cholesterol oxidase (CHOD) (EC 1.1.3.6) from Brevibacterium sterolicum and Streptomyces strain SA-COO. Re- 
action catalyzed: cholesterol + oxygen -> cholest-4-en-3-one + hydrogen peroxide. 

- AlkJ [3], an alcohol dehydrogenase from Pseudomonas oleovorans, which converts aliphatic medium-chain-length 
alcohols into aldehydes. This family also includes a lyase: 

- (R)-mandelonitrile lyase (EC 4.1.2.10) (hydroxynitrile lyase) from plants [4], an enzyme involved in cyanogenis, 
the release of hydrogen cyanide from injured tissues. These enzymes are proteins of size ranging from 556 (CHD) 
to 664 (MOX) amino acid residues which share a number of regions of sequence similarities. One of these regions, 
located in the N-terminal section, corresponds to the FAD ADP- binding domain. Thef unction of the other conserved 
domains is not yet known; two of these domains have been selected as signature patterns. The first one is located 
in the N-terminal section of these enzymes, about 50 residues after the ADP-binding domain, while the second 
one is located in the central section. 

- Consensus pattern: [GA]-[RKN>x-[Llv>G(2HGST](2)-x-[Uv^ 

- Consensus pattern: [GS]-[PSTA]-x(2)-[STJ-P-x-[LIVM](2)-x(2)-S-G-[LIVM]-G 

[ 1) Cavener D R. J. Mol. Biol. 223:811-814(1992). 

[ 2] Henikoff S., Henikoff J.G. Genomics 19:97-107(1994). 

[ 3] van Beilen J.B., Eggink G., Enequist H., Bos R., Witholt B. Mol. Microbiol. 6:3121-3136(1992). 
{ 4] Cheng IP, Poulton J.E. Plant Cell PhysioL 34:1139-1143(1993). 

[0290] 77. CKS (Cyclin-dependent kinase regulatory subunit) Number of members: 11. Cyciin-dependent kinases 
(CDK) are protein kinases which associate with cyciins to regulate eukaryotic cell cycle progression. The most well 
known CDK is p34-cdc2 (CDC28 in yeast) which is required for entry into S-phase and mitosis. CDK's bind to a regu- 
latory subunit which is essential for their biological function. This regulatory subunit is a small protein of 79 to 150 
residues. In yeast (gene CKS1) and in fission yeast (gene sucl) a single isoform is known, while mammals have two 
highly related isof orms. It has been shown [ 1 ] that these CDK regulatory subunits assemble as an hexamer which then 
acts as a hub for the oligomerization of six CDK catalytic subunits. The sequence of CDK regulatory subunits are highly 
conserved therefore, the two most conserved regions have been used as signature patterns. 

- Consensus pattern: Y-S-x-(KR]-Y-x-IDE](2)-x-[FY]-E-Y-R-H-V-x-[LVT-[PT]-[KRPJ 

- Consensus pattern: H-x-P-E-x-H-[IV]-L-L-F-(KR] 

[0291] [ 1] Parge H.E., Arvai AS., Murtari D.J., Reed S.I.. Tainer J.A. Science 262:387-395(1993). 
[0292] 78. CK_ll_beta (Casein kinase II regulatory subunit) 

Number of members: 16. Casein kinase II (CK-2) [1] is an ubiquitous eukaryotic serine/threonine protein kinase which 
is found both in the cytoplasm and the nucleus and whose substrates are numerous. It generally phosphorylates Ser 
or Thr at the N-terminal of stretch of acidic residues (see <PDOC00006>). CK-2 exists as an heterotetramer composed 
of two catalytic subunits (alpha) and two regulatory subunits (beta). In most species there are two closely related 
isoforms of the catalytic subunit: alpha and alpha'. Some species, such as fungi and plants, express two forms of 
regulatory subunits: beta and beta*. The exact function of the regulatory subunit is not yet known. It is a highly conserved 
protein of about 25 Kdthat contains, in its central section, a cysteine-rich motif that could be involved in binding a metal 
such as zinc [2]. This region has been used as a signature pattern. 

- Consensus pattern: C-P-x-[LIVMY)-x-C-x(5)-[LI]-P-[LIVMC]-G-x(9)-V-[KR]-x(2)-C-P-x-C 
[ 1] Allende J.E., Allende C.C. FASEB J. 9:313-323(1995). 
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[ 2] Reed J.C., Bidwai A. P., Glover C.V.C. J. Biol. Chem. 269:18192-18200(1994). 
[0293] 79. CLP_protease (Clp protease) 

These proteins belong to family S1 4 in the classification of peptidases. 

* the^ataSicS 86 ^ " ^ ^ C ° H C ' P Se " 1 " > hiS " 136 "« as P- 185 fom 

■ !- Swiss:P48254 has tost all of these active site residues and is therefore inactive 

- !- Sw.ss:P42379containstwolar g e insertions. Swiss:P42380 contains one large insertion. Number of members: 38 

[0294] The endopeptidase Clp (EC 3.4.21 .92) from Escherichia coli cleaves peptides in various proteins in a process 

of two rented ATP-bmdmg regutetory subunits (genes clpA and cl P X). CIpP is a serine protease which hasTchymt 

family of senne proteases, but which evolved by independent convergent evolution. Proteases highly simita .S 
have been found to be encoded ,n the genome of the chloroplast of plants and seem to be aL present in offer 
eukaiyotes. The sequences around two of the residues invoked in the catalytic triad (a serine and a h^Line) a'e 
highly conserved and can be used as signature patterns specific to that category of proteases. 

- Consensus pattern: T-x(2)-{UVMF]-G-x-A-[SAC]-S-{MSA]-[PAG]-[STA] fS is the active site residue] 

- Consensus pattern: R-x(3)-[EAPJ-x(3HLIVMFrn-M-[LIVM]-H-Q-P [H is the active site residue] 

[UMedline: 98050920. The structure of CIpP at 2.3 angstroms resolution suggests a model for ATP-deoendent 
proteolys.8. Wang J, Hartling JA, Flanagan JM; Cell 1 997 91 447-456 dependent 
j 1] Maurizi M.R.. Clark W.P., Kim S.-H., Gottesman S. J. Biol. Chem. 265:12546-12552(1990) 
[ 2J Gottesman S., Maurizi M.R. Microbiol. Rev. 56:592-621(1992) 
[ 3] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

[0295] 80. CNG_membrane (Transmembrane region cyclic Nucleotide Gated Channel) 

^ , ?! y K^ ,0Un ^!? r I he N - terminus °' ,he cNMP_binding. Number of members: 56. Proteins that bind cyclic nucle- 
otides (cAMP or cGMP) share a structural domain of about 120 residues [1-3]. The best studiedof tSTse p oteTs !s 
he prokaryouc catabo.ite gene activator (also known as the cAMP recep.cV protein) (gene crp)Xe such a don^ in 

Lc^L T^* ° f and a di «° eight-stranded/antiparalle betalaS S 

Such a domain is known to exist in the following proteins: siruciure. 

- Prokaryotic catabolite gene activator protein (CAP). 

- CAMP- and cGMP^lependentp^ 

a eo 2 n " C,e0 ^ e - b,ndi "9 domai "- ™* cAPK's are composed of two different subunits: a catalyTcSS 
a regulatory chain which contains both copies of the domain. The cGPrCs are single chain enzymes mat Z 

mSSZS? ^ meir N " tefminal SeC,i0n - ThS nUC ' e0,kte Sp6Ci,i ^ - cAPK aTcGPKtdue'o 

anammo acid ,n the conserved region of beta-barrel 7: a threonine that is invariant in cGPK is an alanine in most 

- Vertebrate cyclic nucleolide-gated ion-channels. Two such cations channels have been fully characterized One 
* found ,n rod cells where it plays a role in visual signal transduction. It specific^ binds to «S3SSjlS2 

TTi ' 0d0rant Si9na ' ,ransduc,ion - ^ nere are six invamnt amino acids in this 

SZZZtSZ i !t Tt 9VC,ne reSWUeS ** ^ ^ 10 be ,0r Penance of the struc.ura. 

w,th,n beta-barrels and 3 and contains the first two conserved Gly. The second pattern is located withinTeta 
barrels 6 and 7 and contains the third conserved G V as we., as the three other invariant residues 

- Consensus pattern: [LIVM]-fVIC]-x(2)-G-[DENQTA]-x-[GAC]-x(2)-|LIVMFY](4)-x(2)-G 

- Consensus pattern: [LIVMF]-G-E-x-[GAS]-[LIVM]. X (5.1l)-R- { STAQJ-A-x-[LIVMA]-x-[STACVJ 

[ 1] Weber I.T.. Shabb J.B.. Corbin J.D. Biochemistry 28:6122-6127(1989). 
[ 2] Kaupp U.B. Trends Neurosci. 14:150-157(1991). 
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{ 3] Shabb J.B., Corbin J.D. J. Biol. Chem. 267:5723-5726(1992). 

[0296] 81 . COX10_claB_cyoE (Cytochrome c oxidase assembly factor) 
II] 

5 Medline: 95191390 

Biosynthesis and functional role of haem O and haem A 
Mogi T, Saiki K, Anraku Y; 

Mol Microbiol 1994;14:391-398. 
Cytochrome c oxidase is a multi subunit enzyme. The complexity of this enzyme requires assistance in building the 
*0 complex. 

This is carried out by the Cytochrome c oxidase assembly factor. 
Number of members: 31 

[0297] Cytochrome c oxidase is an oligomer*: enzymatic complex which seems to require the aid of a number of 
proteins that either act as chaperonins to help the subunits of the enzyme to fold correctly, or assist in the assembly 
« of the metal centers [1]. One of these subunits is known as COX10 in yeast and as ctaB [2] in aerobic prokaryotes. It 
is evolutionary related to cyoE protein from the Escherichia coli cytochrome O terminal oxidase complex. 
[0298] These proteins probably contain [3] seven transmembrane segments. The most conserved region is located 
in a loop between the second and third of these segments and has been selected as a signature pattern. 

20 - Consensus pattern: [ED]-x-D-x(2)-M-x-R-T-x(2)-R-x(4)-G 

[ 1] Nobrega M.P., Nobrega F.G., Tzagoloff A. 

J. Biol. Chem. 265:14220-14226(1990). 
[ 2] Cao J., Hosier J., Shapleigh J., Revzin A., Ferguson-Miller S. 
25 J. Biol. Chem. 267:24273-24278(1 992). 

[ 3] Chepuri V, Gennis R.B. 

J. Biol. Chem. 265:12978-12986(1990). 

[0299] 82. COX3 (Cytochrome c oxidase subunit III) 
30 This family corresponds to chains c and p. 

[1] 

Medline: 96216288 
The whole structure of the 1 3-subunit oxidized cytochrome c 
oxidase at 2.8 A. 

35 Tsukihara T, Aoyama H, Yamashita E, Tomizaki T, Yamaguchi H, 
Shinzawa-ltoh K, Nakashima R, Yaono R, Yoshikawa S; 
Science 1996;272:1136-1144. 
Number of members: 224 

[0300] 83. COX5B (Cytochrome c oxidase subunit Vb) 

40 [1] 

Medline: 96216288 

The whole structure of the 1 3-subunit oxidized cytochrome c oxidase at 2.8 A. 

Tsukihara T, Aoyama H, Yamashita E, Tomizaki T, Yamaguchi H, Shinzawa-ltoh K, Nakashima R, Yaono R Yoshikawa 

S; 

45 Science 1996;272:1136-1144. 

This family consists of chains F and S 
Number of members: 10 

[0301] Cytochrome c oxidase (EC 1.9.3.1) [1] is an oligomeric enzymatic complex which is a component of the 
respiratory chain complex and is involved in the transfer of electrons from cytochrome c to oxygen. In eukaryotes this 

50 enzyme complex is located in the mitochondrial inner membrane; in aerobic prokaryotes it is found in the plasma 
membrane. In addition to the three large subunits that form the catalytic center of the enzyme complex there are, in 
eukaryotes, a variable number of small polypeptide subunits. One of these subunits which is known as Vb in mammals, 
V in slime mold and IV in yeast, binds a zinc atom. The sequence of subunit Vb is well conserved and includes three 
conserved cysteines that are thought to coordinate the zinc ion [2]. Two of these cysteines are clustered in the C- 

55 terminal section of the subunit; this region has been selected as a signature pattern. 

- Consensus pattern: [LIVM)(2)-[FYW]-x(10)-C-x(2)-C-G-x(2)-[FYl-K-L (The two C ( s probably bind zinc] 
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[ 2] Rizzuto R., Sandona D.. B^T<£S B^Xrn * 7 f 1 f 148(1983) - 

cjp dia i ht A., Btsson R. Biochira Biophys. Acta 1129:100-104(1991). 

[0302J 84. COesterase (Carboxylesterases) 
Cholinesterase pages 

The prints entry is specific to acetylcholinesterase 
Number of members* 273 

- Drosophila esterase P. 

■ Culex pipiens (mosquito) esterases B1 and B2 

- Myzus persicae (peach-potato aphid) esterases E4 and FE4 

Insect juvenile hormone esterase (JH esterase) (EC 3 11 591 

" cSILkh, 3 - 1 - 1 ^ ' Un9i Ge0,fichum ^um'and Candida rugosa 

Caenortiabdrtis gut esterase (gene ges-1 ). njgosa. 

- Duck fatty acyl-CoA hydrolase, medium chain (EC319ui,„ 

proliferation and may p.ay a rote in the pT^^Z^^ffT" ? "* pe ™ iso ™ 

- Membrane enclosed crystal proteins from sfcn^^XS? ,6S,er P heromone s- 

where they are found have therefore ^1^^^!^ ,he 

[0304] so far two bacterial proteins have been found to belong to this family: 

carbamate linkages. erDC ' aes P"enmed lp hamanddesmed ip ham by hydrolyzing their central 

- Para-nitrobenzyl esterase from Bacillus subtilis (gene pnbA). 

S^LS^JS^ ** haVi " 9 ^ ,h6ir a domain evolutional refc.ed to that 

" Sn'e^ 

" . D uTinXCl neUraC,in ^ n,t) M ™ V ' m — «* «"*- - -~*n hereon embryonic cells 

- Drosophila protein glutactin (gene git), whose function is not known. 

! J! l^ re c M -„ R iChm ° nd R C l ° akeshon J G Biol. Evol 5 H3-119(19 8a , 
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(1993). 

[ 4] LockrkJge O. BioEssays 9:125-128(1988). 

[ 5] Wang C.-S.. Hartsuck J.A. Biochim. Btophys. Ada 1166:1-19(1993). 

[0307] 85. CPSase_L_chain (Carbamoyl-phosphate synthase (CPSase)) 
11) 

Medline: 94347758 

Three-dimensional structure of the bkrtin carboxylase subunit. of acetyl-CoA carboxylase. 
Waldrop GL, Rayment I, Hotden HM; 

Biochemistry 1994;33:10249-10256. 

[1] 

Medline: 90285162 

Mammalian carbamyl phosphate synthetase (CPS). DNA sequence and evolution of the CPS domain of the Syrian 
hamster multifunctional protein CAD. 

Simmer JP, Kelly RE, Rinker AG Jr. Scully JL. Evans DR; 
Biol Chem 1990;265:10395-10402. 

Carbamoyl-phosphate synthase catalyzes the ATP-dependent synthesis of carbamyl-phosphate from glutamine or 
ammonia and bicarbonate. This important enzyme initiates both the urea cycle and the biosynthesis of arginine and/ 
or pyrimidines [2]. 

The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a heterodimer of a small and large chain. The 
small chain promotes the hydrolysis of glutamine to ammonia, which is used by the large chain to synthesize carbamoyl 
phosphate. See CPSase_sm_chain. 

The small chain has a GATase domain in the carboxyl terminus. 
See GATase. 
Number of members: 181 

[0308] Carbamoyl-phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of carbamyl-phosphate 
from glutamine (EC 6.3.5.5) or ammonia (EC 6.3.4.16) and bicarbonate [1]. This important enzyme initiates both the 
urea cycle and the biosynthesis of arginine and pyrimidines. 

[0309] Glutamine-dependent CPSase (CPSase II) is involved in the biosynthesis of pyrimidines and purines. In bac- 
teria such as Escherichia coli, a single enzyme is involved in both biosynthetic pathways while other bacteria have 
separate enzymes. The bacterial enzymes are formed of two subunits. A small chain (gene carA) that provides 
glutamine amidotransferase activity (GATase) necessary for removal of the ammonia group from glutamine, and a 
large chain (gene carB) that provides CPSase activity. Such a structure is also present in fungi for arginine biosynthesis 
(genes CPA1 and CPA2). In most eukaryotes, the first three steps of pyrimidine biosynthesis are catalyzed by a large 
multifunctional enzyme - called URA2 in yeast, rudimentary in Drosophila and CAD in mammals [2]. The CPSase 
domain is located between an N-terminal GATase domain and the C-terminal part which encompass thedihydroorotase 
and aspartate transcarbamylase activities. 

[0310] Ammonia-dependent CPSase (CPSase I) is involved in the urea cycle in ureolytic vertebrates; it is a mono- 
functional protein located in the mitochondrial matrix. 

[0311] The CPSase domain is typically 1 20 Kd in size and has arisen from the duplication of an ancestral subdomain 
of about 500 amino acids. Each subdomain independently binds to ATP and it is suggested that the two homologous 
halves act separately, one to catalyze the phosphorylation of bicarbonate to carboxy phosphate and the other that of 
carbamate to carbamyl phosphate. 

[0312] The CPSase subdomain is also present in a single copy in the biotin-dependent enzymes acetyl-CoA car- 
boxylase (EC 6.4.1.2) (ACC), propionyl-CoA carboxylase (EC 6.4.1.3) (PCCase), pyruvate carboxylase (EC 6.4.1.1) 
(PC) and urea carboxylase (EC 6.3.4.6). 

[031 3] Two conserved regions which are probably important for binding ATP and/or catalytic activity have been se- 
lected as signatures for the subdomain. 

- Consensus pattern: [FWl-[PSMLIVMCHLIVMA]-(LIVM]-[KR]-[PSAHSTA]-x(3)-[SG]-G-x-[AG] 

- Consensus pattern: [LIVMF]-[LIMN]-E-[LIVMCA]-N-[PATLIVM]-[KR]-[LIVMSTAC] 

[ 1) Simmer J. P., Kelly R.E., Rinker A.G. Jr., Scully J.L., Evans D R. 

J. Biol. Chem. 265:10395-10402(1990). 
[ 2) Davidson J.N., Chen K.C., Jamison R.S., Musmanno LA., Kern C.B. 

BioEssays 15:157-164(1993). 

[0314] 86. CPSase_sm_chain (Carbamoyl-phosphate synthase small chain, CPSase domain) 
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[1] 

Medline: 90285162 



Simmer JR Kelly RE, Rinker AG Jr, Scully JL, Evans DR- 
Biol Chem 1 990;265: 1 0395- 1 0402. 

cHZ^TT 05 ^ 0 SymbaSe domain is in *° a ™° terminus of protein 

assist**: a^rsr - rr~~- - — - - 

> orpyrimidines[1]. ^ ™ inmates both the urea cycle and the biosynthesis of arginine and/ 

phosphate. See CPSase_L_chain ^ t °^^ ,a ' Wh,ch 15 usedb y^"arge chain to synthesize carbamoyl 

lee Mas*™ 3 GATaSe d0main h ,h8 Carb0Xyl ,erminus - 
Number of members* 46 

urea cycle and the biosynthesis of arginine and pyrimidSs ™ ^ «W» initiates both the 

separate enzymes. The bactera. enzymes^re Toled 3 * 0,hW tew 

glutamine amidotransferase activity (GATase) ns«^ fcJ^S / Sma " Chan (9en ° rarA > Prides 
large chain (gene carB) that provides b PSa se S^t.^^,?"' amm ° nia ^ 9lU,amine ' «* a 
(genesCPAl andCPA2). In most eukaryotes thei,S e f ^f^J pre l f em »" fu "9' arginine biosynthesis 
multifunctional enzyme - called URA2 * y^Tr^^Tn "T^*" 8 b,os V nthesis ™ catalyzed by a large 
do^inis^betweenanN-teZal^ » »— « * W The CPSase 

and aspartate transcarbamylase activities C-termmal part which encompass the dihydroorotase 

ELXESESEESsx ■"—»«-"« «*. ~— « . „ . 

halves « »„. „ m „ p)C£^5££o^ l'" 93 "^ *"•»■«> tanologcs 

(PC) and urea carboxylase (EC 6 34 6) } (PCCase >- Pyruvate carboxylase (EC 6 411) 

[0321] 87. CARL_TRIO (CRAI7TRIO domain) 
Medline: 98121119 

Nature 1998:391:506-510 

Number of members: 39 V ° aUSe " does not a PP ear '° a complete structural domain 

[0322] 88. CSD fCold-shock'DNA-binding domain) 

Medline: 94255482 
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Crystal structure of CspA, the major cold shock protein of Escherichia coli. 
Schindelin H, Jiang W t Inouye M, Heinemann U; 
Proc Natl Acad Sci U S A 1994;91:5119-5123. 
Number of members: 121 

5 [0323] A conserved domain of about 70 amino acids has been found in prokaryotic and eukaryotic DNA-binding 
proteins [1 ,2,3, E1 ]. This domain, which is known as the 'cold-shock domain'(CSD) is present in the proteins listed below 

Escherichia coli protein CS7.4 (gene cspA) which is induced in response to low temperature (cold-shock protein) 
and which binds to and stimulates the transcription of the CCAAT-containing promoters of the HN-S protein and 
jo of gyrA. 

Mammalian Y box binding protein 1 (YB1 ). A protein that binds to the CCAAT-containing Y box of mammalian HLA 
class II genes. 

- Xenopus Y box binding proteins -1 and -2 (Y1 and Y2). Proteins that bind to the CCAAT-containing Y box of 
Xenopus hsp70 genes. 

* 5 - Xenopus B box binding protein (YB3). YB3 binds the B box promoter element of genes transcribed by RNA polymer- 
ase III. 

Enhancer factor I subunit A (EFI-A) (dbpB). A protein that also bind to CCAAT-motif in various gene promoters. 
DbpA, a Human DNA-binding protein of unknown specificity. 
Bacillus subtilis cold-shock proteins cspB and cspC. 
20 - Streptomyces clavuligerus protein SC 7.0. 

Escherichia coli proteins cspB, cspC, cspD, cspE and cspF. 

Unr, a mammalian gene encoded upstream of the N-ras gene. Unr contains nine repeats that are similar to the 
CSD domain. The function of Unr is not yet known but it could be a multivalent DNA-binding protein. 

25 [0324] As a signature pattern for the CSD domain, its most conserved region which is located in its N-terminal section 
has been selected. It must be noted that the 

beginning of this region is highly similar [4) to the RNP-1 RNA-binding motif. 

- Consensus pattern: [FY]-G-F-I-x(6 t 7)-[DER]-[LIVM]-F-x-H-x-[STKR]-x-[LIV7^IFY] 

30 

[ 1] Doniger J., Landsman D., Gonda M.A., Wistow G. 

New Biol. 4:389-395(1992). 
[ 2] Wistow G. 

Nature 344:823-824(1 990). 
35 [ 3] Jones P.G., Inouye M. 

Mol. Microbiol. 11:811-818(1994). 
[ 4] Landsman D. 

Nucleic Acids Res. 20:2861-2854(1992). 

*o [0325] 89. CTF.NFI (CTF/NF-I family) 
Number of members: 45 

[0326] Nuclear factor I (NF-I) or CCAAT box-binding transcription factor (CTF) [1 ,2] (also known as TGGCA-binding 
proteins) are a family of vertebrate nuclear proteins which recognize and bind, as dimers, the palindromic DNA se- 
quence 5'-TGGCANNNTGCCA-3\ CTF/NF-I binding sites are present in viral and cellular promoters and in the origin 

45 of DNA replication of Adenovirus type 2. 

[0327] The CTF/NF-I proteins were first identified as nuclear factor I, a collection of proteins that activate the repli- 
cation of several Adenovirus serotypes (together with NF-II and NF-III) [3]. The family of proteins was also identified 
as the CTF transcription factors, before the NFI and CTF families were found to be identical [4]. The CTF/NF-I proteins 
are individually capable of activating transcription and DNA replication. The CTF/NF-I family name has also been 

50 dubbed as NFI, NF-I or NF1. 

[0328] In a given species, there are a large number of different CTF/NF-I proteins. The multiplicity of CTF/NF-I is 
known to be generated both by alternative splicing and by the occurrence of four different genes. The known forms of 
NF-I genes have been classified as: 

55 - The CTF-like factors subfamily (prototype form: CTF-1) [4] 
The NFI-X proteins. 
The NFI-A proteins. 
The NFI-B proteins. 
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K S££ VroZ SnTZT^z to T s r i,ar transcrip,ten " - w- . 

fectry conserved in aH ^^^^^2^1^^ ^ almost per- 

Adenovirus DNA repletion. The C -termS SJSrTSS con^T? I" 08 """* ^ dime ^,ion and 
: vationdomain is the target of gene exo essSn Z^Zf ? ? ^ ,ranscrl P ,io " al activation domain. This acfi- 
transcription fac^^ interacts with basal 

* - Consensus pattern: R-K-R-K-Y-F-K-K-H-E-K-R 

[ 1] Mermod N., O'Neill EA, Kelly T.J., Tjian R 
Cell 58:741-753(1989). 

[ 2] Rupp RAW. Kruse u.. Mufthaup a. Goebel U, Beyreuther K 

oippelAE. '* 

Nucleic Acids Res. 18:2607-2616(1990). 
[ 3] Nagata K., Guggenheimer R A, Enomoto T., Lichy J H Hurwitz J 

Proc. Natl. Acad Sci. U.S.A. 79:6438-6442(1932) 
[ 4J Santoro C., Mermod N., Andrews PC, Tjian R 

Nature 334:21 1 8-2224(1 988). 
[ 5] Gil G., Smith J.R., Goldstein J.L, Slauqhter C A. Orth K Ft™,* m o ^ u 

Proc. Natl. Acad. Sci. U.S.A 85:89^3-89670^^ " TE 

[ lTe^79^ 

[0332J 90. Calsequestrin (Calsequestrin) 
Number of members: 13 

S?Sd^^^^ 

calcium buffer and plays an important rc.e in the T C ' S,emae - Catee Wrin acts as a 

about 400 amho acid residues** bir^s Ire Zn 40 ™t«T7 C ° UPlin9 - " " 3 ackfc P ro,ei " of 

- Consensus pattern: [EQ]-[DE]-G-L-[DN]-F-P-x-Y-D-G-x-D-R-V 

- Consensus pattern: [DE]-L-E-D-W-[L.VM]-E-D-V-L-x-G-x-[LIVM}-N-T-E-D-D-D 

[0335] [ i] Treves &. Vilsen B.. Chiozzi P., Andersen J.R. Zorzato F 
Biochem. J. 283:767-772(1992). ">rzaio r. 

[0336] 91. Carboxyljrans (Carboxyl transferase domain) 
Medline: 93374821 

Thornton CG, Kumar GK. Haase FC, Phillips NF Woo SB Park VM 
Magner WJ , Shenoy BC. Wood HG, Saml D; ^eln S 7 S. 5 301 -S308. 
Medline: 93358891 
Molecular evolution of biotin-dependent carboxylases 
Tob H, Kondo H, Tanabe T; 
Eur J Biochem 1993; 21 5:687-696. 
All of the members in this family are brotin dependent carboxylases 
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All of the members in this family utilise acyl-CoA as the acceptor molecule. 
Number of members: 47 

[0337] 92. Chal_stil_synt (Chalcone and stiibene synthases) 
Number of members: 146 

s [0338] Chalcone synthases (CHS) (EC 2.3. 1 .74) and stiibene synthases (STS) (formerly known as resveratrol syn- 
thases) are related plant enzymes [1J. CHS is an important enzyme in flavanoid biosynthesis and STS a key enzyme 
in stilbene-type phyloalexin biosynthesis. Both enzymes catalyze the addition of three molecules of mabnyl-CoA to a 
starter CoA ester (a typical example is 4-coumaroyl-CoA), producing either a chalcone (with CHS) or stiibene (with 
STS). 

io [0339] These enzymes are proteins of about 390 amino-acid residues. A conserved cysteine residue, located in the 
central section of these proteins, has been shown [2] to be essential for the catalytic activity of both enzymes and 
probably represents the binding site for the 4-coumaryl-CoA group. The region around this active site residue is well 
conserved and can be used as a signature pattern. 

[0340] In addition to the plant enzymes, this family also includes Bacillus subtilis bcsA. 

15 

- Consensus pattern: R^LIVMFYS]-x-{LIVM]-x-[QHG]-x-G^-[FYNA)-[GA]^-[GA]-[STAV]-x-[LIVMF]4RA] (C is the 
active she residue] 

[ 11 Schroeder J., Schroeder G. 
20 z. Naturf orsch. 45C: 1 -8( 1 990). 

[ 2) Lanz T., Tropf S., Manner F.-J., Schroeder J., Schroeder G. 
J. Biol. Chem. 266:9971-9976(1991). 

[0341] 93. Chorismate_synt (Chorismate synthase) 

25 Number of members: 19 

[0342] Chorismate synthase (EC 4.6.1.4) catalyzes the last of the seven steps in the shikimate pathway which is 
used in prokaryotes, fungi and plants for the biosynthesis of aromatic amino acids. It catalyzes the 1 ,4-trans elimination 
of the phosphate group from 5-enolpyruvylshikimate-3-phosphate (EPSP) to form chorismate which can then be used 
in phenylalanine, tyrosine or tryptophan biosynthesis. Chorismate synthase requires the presence of a reduced flavin 

oo mononucleotide (FMNH2 or FADH2) for its activity. 

[0343] Chorismate synthase from various sources shows [1 ,2] a high degree of sequence conservation. It is a protein 
of about 360 to 400 amino-acid residues. Three signature patterns have been developed from conserved regions rich 
in basic residues (mostly arginines). The first is in the N-terminal section, the second is central and the third is C-terminal. 

35 - Consensus pattern: G-E-S-H-[GC]-x(2)-[LIVM]-[GTV]-x-[LIVMl(2HDE]-G-x-[PV] 

- Consensus pattern: (GE]-R-[SA](2)-[SAG]-R-[EV)-[ST]-x(2)-[RH]-V-x(2)-G 

- Consensus pattern: R-[SH]-D-[PSVJ-[CSAV)-x(4)-[GAI]-x-[IVGSP]-[LIVM]-x-E-[STAH]-[LIVM] 

[ 1] Schaller A., Schmid J., Leibinger U., Amrhein N. 
^0 J. Biol. Chem. 266:21 434-21 438(1 991 ). 

[ 2) Jones D.G.L, Reusser U., Braus G.H. 
Mol. Microbiol. 5:2143-2152(1991). 

[0344] 94. Ciat_adaptor_s (Clathrin adaptor complex small chain) 

45 Number of members: 21 

[0345] Clathrin coated vesicles (CCV) mediate intracellular membrane traffic such as receptor mediated endocytosis. 
In addition to clathrin, the CCV are composed of a number of other components including oligomers complexes which 
are known as adaptor or clathrin assembly proteins (AP) complexes (1 ]. The adaptor complexes are believed to interact 
with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. In mammals two type of 

so adaptor complexes are known: AP-1 which is associated with the Golgi complex and AP-2 which is associated with 
the plasma membrane. Both AP-1 and AP-2 are heterotetramers that consist of two large chains - the adaptins - 
(gamma and beta' in AP-1 ; alpha and beta in AP-2); a medium chain (AP47 in AP-1 ; AP50 in AP-2) and a small chain 
(AP19 in AP-1; AP17 in AP-2). 

[0346] The small chains of AP-1 and AP-2 are evolutionary related proteins of about 18 Kd. Homologs of AP17 and 
55 AP19 have also been found in yeast (genes APS1/YAP19 and APS2/YAP17) [2,3,4]. AP17 and AP19 are also related 
to the zeta-chain [5] of coatomer (z eta-cop), a cytosolic protein complex that reversibly associates with Golgi mem- 
branes to form vesicles that mediate biosynthetic protein transport from the endoplasmic reticulum, via the Golgi up 
to the trans Golgi network. 
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[0347, A^^,^^^^^^^^ 

- Consensus pattern: P-IVM](2)-Y-[KR]-x(4)-L-Y-F 

[ 1J Pearse B.M., Robinson M.S. 
Annu. Rev. Cell Biol. 6:151-171(1990) 

' lbb C B haUSen T " ° aViS A C - FrUCht S ' 0 ' Brine GreCO B - Pa V ne G s - 

J. Biol. Chem. 266:11153-11157(1991). 
[ 3] Nakai M., Takada T. ( Endo T. 

Biochim. Biophys. Acta 1174:282-284(1993) 

J. Cell Biol. 123:1727-1734(1993). 

[0348] 95. CtethrinJgLch (Clathrin light chain.) 
Number of members: 8 

ess srisi r„3r ssrs^r-srrr. - - ° mm * - — - 

isr" * • <~ — • i «* S£U ,„ „ * ,„ 

- Consensus pattern: F-L-A-Q-Q-E-S 

[ 1 J Keen J.H. 

Annu Rev. Biochem. 59:415-438(1990) 
[2J Brodsky F.M. 

Science 242:1 396-1402(1 988) 
[ 3] Brodsky EM.. Hill B.L, Acton S.L, Naethke L, Wong D H 

Ponnambalam S., Parham P 

Trends Biochem. Sci. 16:208-213(1991). 

[0352] 96. (Clathrin repeat) 7-fold repeat in Clathrin and VPS 
Each repeat is about 1 40 amino acids long The reoeats nrrnr in ,h» 

Number of members' 79 P ^ Uf ,n the arm re 9 ,on of tne Clathrin heavy chain 

[1] 

Medline: 92191269 

"iSSTS 1 J" 0 ' 128 !! 0 " * c,athrin subunil * « triske.ion hub. 

r » « US9r J ' LUpaS * S,0ck J « Turck CW. Brodsky FM 
Cell 1992;68:899-910. [2J y ' 

Medline: 88097376 

Clathrin heavy chain: molecular cloning and complete primary structure 
K.rchhausenT, Harrison SC, Chow EP, Mattaliano RJ °' SlrUC,Ure - 
Ramachandran KL, Smart J, Brosius J; 
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Proc Natl Acad Sci U S A 1987;84:8805-8809. 
[0353] 97. Collagen (Collagen triple helix repeat (20 copies)) 
[1] Medline: 94059583 
New members of the collagen s uperf amity 
Mayne R, Brewton RG; 
CurrOpin Cell Biol 1993;5:883-890. 
Scurvy is associated with collagens. 
Members of this family belong to the collagen superfamily [1 J. 

Collagens are generally extracellular structural proteins involved in formation of connective tissue structure. 
The alignment contains 20 copies of the G-X-Y repeat that forms a triple helix. The first position of the repeat is glycine, 
the second and third positions can be any residue but are frequently proline and hydroxyproline. Collagens are post 
translationally modified by proline hydoxylase to form the hydroxyproline residues. Defective hydroxylation is the cause 
of scurvy. 

Some members of the collagen superfamily are not involved in connective tissue structure but share the same triple 

helical structure. 

Number of members: 2125 

[0354] 98. Coprogen_oxidas (Coproporphyrinogen HI oxidase) 
Number of members: 12 

Coproporphyrinogen III oxidase (EC 1.3.3.3) (coproporphyrinogenase) [1,2] catalyzes the oxidative decarboxylation 
of coproporphyrinogen III into protoporphyrinogen IX t a common step in the pathway for the biosynthesis of porphyrins 
such as heme, chlorophyll or cobalamin. 

[0355] Coproporphyrinogen III oxidase is an enzyme that requires iron for its activity. A cysteine seems to be important 
for the catalytic mechanism [3]. Sequences from a variety of eukaryotic and prokaryotic sources show that this enzyme 
has been evolutionarily conserved. A highly conserved region in the central part of the sequence has been selected 
as a signature pattern. This region contains the only conserved cysteine and is rich in charged amino acids. 

- Consensus pattern: K-x-W-C-x(2)-[FYH](3)-[LIVM]-x-H-R-x-E-x-R-G-[LIVM]-G-G-[LIVM]-F-F-D 

[1]Xu K.,EllinttT. 

J. Bacterbl. 175:4990-4999(1993). 
[ 2) Kohno H., Furukawa T, Yoshinaga T., Tokunaga R., Taketani S. 

J. Biol. Chem. 268:21359-21363(1993). 
[ 3] Camadro J.M., Chambon H., Jolles J., Labbe R 

Eur. J. Biochem. 156:579-587(1986). 
[4] Xu K.. Elliott T. 

J. Bacteriol. 176:3196-3203(1994). 

[0356] 99. Corona_nucleoca (Coronavirus nucleocapsid protein) 
[1] 

Medline: 98087828 
Identification of a specific interaction between the 
coronavirus mouse hepatitis virus A59 nucleocapsid protein 
and packaging signal. 
Molenkamp R, Spaan WJ; 
Virology 1997;239:78-86. 
Number of members: 44 
[0357] 100. Cu-oxidase (Multicopper oxidase) 
(1) 

Medline: 90126844 

The blue oxidases, ascorbate oxidase, laccase and ceruloplasmin. 
Modelling and structural relationships. 
Messerschmidt A, Huber R; 
Eur J Biochem 1990;187:341-352. 
Number of members: 150 

[0358] Multicopper oxidases [1 ,2] are enzymes that possess three spectroscopically different copper centers. These 
centers are called: type 1 (or blue), type 2 (or normal) and type 3 (or coupled binuclear). The enzymes that belong to 
this family are: 



C7 
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- Ascorbate oxidase (EC 1.10.3.3). a higher plant enzyme. 
* 

and seem to have evolved fj Z M^^^^JT^^ ***** Sequence 
ascorbate oxidase. P " * "^-tandng domam similar to that found in laccase ani 

- Blood coagulation factor V (Fa V). 

- Blood coagulation factor VIII (Fa VIII) [El J. 

- Yeast FET3 [3], which is required for ferrous iron uptake 

- Yeast hypothetical protein YFL041 w and SpAC1F7.08. the fissrcn yeast homolog. 

^,i^^^^ 

domain is related to the multicopy oxidases ' ,0,,0W,n9 order: A -A-B-A-C-C. The A-type 

that are known to be invofved in the binding o, ccZ £2T The 1 " C °™ nS residues 

presence of copper-bhding residues and thus cTdeS dLlt, ^ . "* assum P li °" on the 

^aVandFaV,,,)^ 

the 3rd H, and L or M are co^^^^ ^ ** *"° 3rB Copper **» 3 bi "*9 ^idues] [The C. 

[0362] 101. Cullin (Cullin family) 
Number of members: 24 

[0363] The following proteins are collectively termed cullins [1]: 

- Caenorhabditis elegans cul-2, cul-3, cul-4 (F45E12 34 ml ? f7«« „ ^ 

- Mammalian CUL1, CUL2, CUL3, c6l4A^Scul4B " ^ (K ° 8E7 7) 

- Mam malianvasopressin-activatedcalcium-mobilizinqreceDtorfVArM 14 ^ 

- Fiss»on yeast hypothetical protein SpAC24H6.03. 

[0364] The cullins are hydrophilic Droteins of 74nt««i^^; T , 

part of these proteins. A signature pattern has been developed from that rea 6rm ' na extremitv ts tne mos t conserved 

- Consensus pattern: W^HL.V, W 

' DSSr* a4 ^ M " ■ SPielman W S • Smfth W L - SN P " Me V- J.M.. 

Am. J. Physiol. 268:f1198-F1210(1995) 

f l Math i a V N -' J ° hnSOn S L * Winey M ' Adams A.E., Goetsch L Prinole J R 
Byers B. t Goebl M.G. l., rnngie j.k, 
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MoL Cell. Biol. 16:6634-6643(1996). 

[0365] 102. (Cu_amine_oxid) 
Copper amine oxidase signatures 

Amine oxidases (AO) [1] are enzymes that catalyze the oxidation of a wide range of biogenic amines including many 
neurotransmitters, histamine and xenobiotic amines. There are two classes of amine oxidases: flavin-containing (EC 
1.4.3.4) and copper-containing (EC 1.4.3.6). 

[0366] Copper-containing AO is found in bacteria, fungi, plants and animals, it is an homodimeric enzyme that binds 
one copper ion per subunit as well as a 2,4,5- trihydroxyphenylatanine quinone (or topaquinone) (TPQ) cofactor. This 
cofactor is derived from a tyrosine residue. 

[0367] Two signature patterns were derived for copper AO, the first one contains the tyrosine which give rises to the 
TPQ cofactor while the second one contains one of the three histidines that bind the copper atom [2]. 
[0368] Consensus pattem[LI VMJ-[UVMA]-{LIVMF]-x(4)-[ST]-x(2)-N-Y-IDEHYN] [The first Y gives rises to TPQ] Se- 
quences known to belong to this class detected by the patternALL 

[0369] Consensus pattern T-x-{GS]-x(2)-H-[LIVMF]-x(3)-E-[DE]-x-P [H is a copper ligand] Sequences known to be- 
long to this class detected by the pattern ALL, except for lentil AO. 

[ 1] Knowles PR, Dooley D.M. (In) Metal ions in biological systems; Sigel H., Sigel A., Eds., 30:361- 403, Marcel 
Dekker, New- York, (1993). 

[ 2] Parsons M.R., Convery M.A., Wilmot CM., Yadav K.D.S., Blakeley V, Comer A.S., Phillips S.E. V., McPherson 
M.J., Knowles PR Structure 3:1171-1184(1995). 

[0370] 103. Cys-protease (Cysteine protease) 
Number of members: 358 

[0371] Eukaryotic thiol proteases (EC 3.4.22.-) [1] are a family of proteolytic enzymes which contain an active site 
cysteine. Catalysis proceeds through a thioester intermediate and is facilitated by a nearby histidine side chain; an 
asparagine completes the essential catalytic triad. The proteases which are currently known to belong to this family 
are listed below (references are only provided for recently determined sequences). 

- Vertebrate lysosomal cathepsins B (EC 3.4.22.1), H (EC 3.4.22.16), L (EC 3.4.22.15), and S (EC 3.4.22.27) [2]. 

- Vertebrate lysosomal dipeptidyl peptidase I (EC 3.4.14.1) (also known as cathepsin C) [2]. 

- Vertebrate calpains (EC 3.4.22.17). Calpains are intracellular calcium-activated thiol protease that contain both a 
N-terminal catalytic domain and a C-terminal calcium-binding domain. 

Mammalian cathepsin K, which seems involved in osteoclastic bone resorption [3]. 
Human cathepsin O [4]. 

- Bleomycin hydrolase. An enzyme that catalyzes the inactivation of the antitumor drug BLM (a glycopeptide). 

- Plant enzymes: barley aieurain (EC 3.4.22.16), EP-B1/B4; kidney bean EP-C1, rice bean SH-EP; kiwi fruit act in idin 
(EC 3.4.22.14); papaya latex papain (EC 3,4,22.2), chymopapain (EC 3.4.22.6), caricain (EC 3.4.22.30), and pro- 
teinase IV (EC 3.4.22.25); pea turgor-responsive protein 15A; pineapple stem bromelain (EC 3.4.22.32); rape 
COT44; rice oryzain alpha, beta, and gamma; tomato low-temperature induced, Arabidopsis thaliana A494 RD1 9A 
andRD21A. 

House-dust mites allergens DerP1 and EurM1 . 

Cathepsin B-like proteinases from the worms Caenorhabditis elegans (genes gcp-1, cpr-3, cpr-4, cpr-5 and cpr- 
6), Schistosoma mansoni (antigen SM31 ) and Japonica (antigen SJ31 ), Haemonchus contortus (genes AC-1 and 
AC-2), and Ostertagia ostertagi (CP-1 and CP-3). 
Slime mold cysteine proteinases CP1 and CP2. 
Cruzipain from Trypanosoma cruzi and brucei. 

Throphozoite cysteine proteinase (TCP) from various Plasmodium species. 
Proteases from Leishmania mexicana, Theileria annulata and Theileria parva. 
Baculoviruses cathepsin- like enzyme (v-cath). 

Drosophila small optic lobes protein (gene sol), a neuronal protein that contains a calpain-like domain. 

- Yeast thiol protease BLH1/YCP1/LAP3. 

Caenorhabditis elegans hypothetical protein C06G4.2, a calpain-like protein. 

[0372] Two bacterial peptidases are also part of this family: 

Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 
Thiol protease tpr from Porphyromonas gingivalis. 
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[0373] Three other proteins are structure.* reiated to this famiry, but may have M the. proteose actMty 

<PDOC00382>) contused wrth mouse testm whrch is a LIM-domain protein (see 

The sequences around the three active sits residues are weU conserved and can be used as signature pat- 

" i^n,s,c;™^ 

[ 1J Dufour E. 

Biochimie 70:1335-1342(1988). 
[ 2] Kirschke H., Barrett A. J., Rawlings N.D 

Protein Prof. 2:1587-1643(1995). 

f cloi? " R> Chapman H A - »WW S.M., Deleeuw C, RedoV V Y Weiss S J 
FEBS Lett. 357:129-134(1995). Y ' S J ' 

[ 4] Velascc > a, Ferrando A. A.. Puente XS , Sanchez LM, LopezOtin C 

J. Biol. Chem. 269:271 36-27142(1 994) °f**<*m U 

[ 5] Chapot-Chartier M.P., Nardi M., Chopin M.C., Chopin A. Gripon J C 

Appl. Environ. Microbiol. 59:330-333(1993) 
[ 6J Higgins D.G., McConnell D.J., Sharp PM 

Nature 340:604-604( 1 989). 
[ 7] Rawlings Barrett A.J. 

Meth. Enzymol. 244:461-486(1994). 

c ™Zei;^^^ 

J Mol Biol 1996;262:202-224. 
[1] Medline: 99059720 

EMBO J 1998;17:6827-6838 
Database Reference: SCOP; lest; , a; [SCOP-USAJ[CATH-PDBSUM] 

Th,s tarn,, .ncu.es enzymes involved in cysteine and methionine rnlbc*sm. The lowing are members: 

Cystathionine gamma-lyase, 
Cystathionine gamma-synthase, 
Cystathionine beta-lyase, 
Methionine gamma-lyase, 
OAH/OAS sulfhydrylase, 
O-succinylhornoserine suiphhydrylase 

All of these members participate is slightly different reactions 
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Cystathionine gamma-lyase (EC 4.4.1.1) (gamma-cystathionase), which catalyzes the transformation of cystath- 
ionine into cysteine, oxobutanoate and ammonia. This is the final reaction in the transulfu ration pathway that leads 
from methionine to cysteine in eukaryotes. 

Cystathionine gamma-synthase (EC 4.2.99.9), which catalyzes the conversion of cysteine and succinyl-homoser- 
ine into cystathionine and succinate: the first step in the biosynthesis of methionine from cysteine in bacteria (gene 
rnetB). 

Cystathionine beta-lyase (EC 4.4.1.8) (beta-cystathionase). which catalyzes the conversion of cystathionine into 
homocysteine, pyruvate and ammonia: the second step in the biosynthesis of methionine from cysteine in bacteria 
(gene metC). 

Methionine gamma-lyase (EC 4.4.1.11) (L-methioninase) which catalyzes the transformation of methionine into 
methanethiol, oxobutanoate and ammonia. 

OAH/OAS sulfhydrylase, which catalyzes the conversion of acetylhomoserine into homocysteine and that of ace- 
tytserine into cysteine (gene MET17 or MET25 in yeast). 
O-succinylhomoserine sulfhydrylase (EC 4.2.99.-). 
Yeast hypothetical protein YGL1 84c. 
Yeast hypothetical protein YHR1 1 2c. 

[0377] These enzymes are proteins of about 400 amino-acid residues. The pyridoxal-P group is attached to a lysine 
residue located in the central section of these enzymes; the sequence around this residue is highly conserved and can 
be used as a signature pattern to detect this class of enzymes. 

- Consensus pattern: [DQ]-[LIVMF]-x(3)-[STAGC]-[STAGCI>T-K-[FYWQ]-[LIVMF]-x-G-[HQ]-[SGNH] [K is the pyri- 
doxal-P attachment site) 

[ 1 ] Ono B.I., Tanaka K. ( Naito K., Heike C, Shinoda S., Yamamoto S., Ohmori S., Oshima T., Toh-E A. J. Bacteriol. 
174:3339-3347(1992). 

[ 2] Barton A.B.. Kaback D.B., Clark M.W., Keng T., Ouellette B.F.F, Storms R.K., Zeng B., Zhong W.W., Fortin 
N., Delaney S., Bussey H. Yeast 9:363-369(1993). 

[0378] 105. Cyt_reductase 
FAD/NAD-binding Cytochrome reductase 
Number of members: 60 
[1] Medline: 95111952 

Crystal structure of the FAD-containing fragment of com nitrate reductase at 2.5 A resolution: relationship to other 
flavoprotein reductases. 
Lu G, Campbell WH, Schneider G, Lindqvist Y; 
Structure 1994;2:809-821. 
[2] Medline: 92084635 

The sequence of squash NADH:nitrate reductase and its relationship to the sequences of other flavoprotein oxidore- 
ductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. 
Hyde GE, Crawford NM. Campbell WH; 

J Biol Chem 1991;266:23542-23547. 
[0379] 106. Cytidylyttrans 
Phosphatidate cytidyty (transferase 
Number of members: 21 

[0380] Phosphatidate cytidylyltransf erase (EC 2.7.7.41 ) (1 ,2,3] (also known as CDP<iiacy (glycerol synthase) (CDS) 
is the enzyme that catalyzes the synthesis of CDP-diacy (glycerol from CTP and phosphatidate (PA). CDP-diacy)glycerol 
is an important branch point intermediate in both prokaryotic and eukaryotic organisms. CDS is a membrane-bound 
enzyme. A conserved region located in the C-terminal part has been selected as a signature pattern. 

- Consensus pattern: S-x-[LIVMF]-K-R-x(4)-K-D-x-(GSAJ-x(2)"[LI]-[PG)-x-H-G-G-[LIVM]-x-D-R-[LIVMF]-D 

[ 1] Sparrow CP, Flaetz C.R.H. 

J. Biol. Chem. 260:12084-12091(1985). 
[ 2] Shen H., Heacock P.N., Clancey C.J., Dowhan W. 

J. Biol. Chem. 271:789-795(1996). 
I 3] Saito S., Goto K., Tonosaki A., Kondo H. 

J. Biol. Chem. 272:9503-9509(1997). 
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Number of members: 64 

" a domain is known to exist in the following pSeTs Stranded, ant.parallel beta-barrelstructure. Such 

Ptokatyotic catabolite gene activator protein (CAP) - cAMP an n r-r ud. 

Both types of kinases contahs two tandem coptesof SI^bESSSS^^ (CAPK ^ CGPK > 

of two different subunits: a catalytic chain and LJto^l ^T ^ * ^ CAWs afe COmposed 

^lecha^enzymesthatin^^ 
* of cAPK and cGPK is due to an amino acid ^co^^^^T^T^ 1,16 nucl ^e specificity 
cGPK is an alanine in most cAPK. - Vertebra ™,k, ZZ'Z. ?™ L°* ™ a ™™ * a threonine that is 

invariant in 

been fully characterized. One is found in rod (^h^SJ^hT^ 9 ^^ ^ 
t°cGMP leading toanopeningof the channel and^^^^ 

epithelium a similar. cAMP-binding. channel plays a rl Xl^Z ^ ^ of ^ Preceptors. In olfactory 

acids in mis domain, three of which are ^Z re^t^XoZ T^™' ^ iwariant a ™° 

°eta-barre..Twosignaturepa«ems^^^^ 

2and3and chains the firs, .woconser^^ 

.he third conserved Gly as we,, as the three o.WlnS 

SefonTr 5 " 8 Pattem: [UV ^^ ICJ - X ^- G -f DEN QTAH-[GAC]. X (2HLIVMFYl(4)-x( 2) G 

Second consensus pattern: lUVM^G-E^GAS^ ^ 

1 1] Weber I.T., Shabb J.B.. Corbh J.D. Biochemistry 28-6122-6127(1989) 
[ 2] Kaupp U.B. Trends Neurosci. 14:150-157(1991) blZ7 ( 198 9) 
[ 3] Shabb J.B., Corbin J.D. J. Biol. Chem. 267:5723-5726(1992). 
[0384J 109. (cadherin) 

Cadherins extracellular repeated domain signature 

Hgand A wide number of tissue-specific for^ 9 ^ *™ ^ 88 and 

" ? ith6 | ! ll t (E_Cadherin > < afso kn ^n as uvomorulin or L-CAM) (CDHD 

- Neural (N-cadherin) (CDH2). 

- Placental (P-cadherin) (CDH3). 

- Retinal (R-cadherin) (CDH4). 

- Vascular endothelial (VE^adherin) (CDH5) 

- Kidney (K-cadherin) (CDH6). 

- Cadherin-8 (CDH8). 

- Osteoblast <OB-cadherin) (CDH11 ). 

- Brain (BR-cadherin) (CDH 1 2). 

- T-cadherin (truncated cadherin) (CDH13). 
• Muscle (M-cadherin) (CDH1 4). 

- Liver-intestine (LI -cadherin). 
EP-cadherin. 
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Desmoglein 1 (desmosomal glycoprotein I). 
Desmoglein 2. 

Desmoglein 3 (Pemphigus vulgaris antigen). 

[0387] The Drosophita fat protein [3] is a huge protein of over 5000 amino acids that contains 34 cadherin-like repeats 
in its extracellular domain. 

[0388] The signature pattern that was developed (or the repeated domain is located in it the C-terminal extremity 
which is its best conserved region. The pattern includes two conserved aspartic acid residues as well as two aspar- 
agines; these residues could be implicated in the binding of calcium. 

[0389] Consensus pattem[LI V]-x-[LI V]-x-D-x-N-D-[NH]-x-P Sequences known to belong to this class detected by the 
pattern ALL Note this pattern is found in the first, second, and fourth copies of the repeated domain. In the third copy 
there is a deletion of one residue after the second conserved Asp. 

[ 1] Takeichi M. Annu. Rev. Biochem. 59:237-252(1990). 
[ 2] Takeichi M. Trends Genet. 3:213-217(1987). 

[ 3) Mahoney PA., Weber U, Onofrechuk P., Biessmann H., Bryant P.J., Goodman C.S. Cell 67:853-868(1991). 
[0390] 1 1 0. Calreticulin family signatures 

Calreticulin [1] (also known as calregulin, CRP55 or HACBP) is a highopacitycalcium-binding protein which is present 
in most tissues and located at the periphery of the endoplasmic (ER) and the sarcoplamic reticulum (SR)membranes. 
It probably plays a role in the storage of calcium in the lumen ofthe ER and SR and it may well have other important 
functions. Structurally, calreticulin is a protein of about 400 amino acid residues consisting of three domains: a) An N- 
terminal, probably globular, domain of about 180 amino acid residues (N-domain); b) A central domain of about 70 
residues (P-domain) which contains three repeats of an acidic 17 amino acid motif. This region binds calcium with a 
low-capacity, but a high-affinity; c) A C-terminal domain rich in acidic residues and in lysine (C-domain). This region 
binds calcium with a high-capacity but a low-affinity. Calreticulin is evolutionary related to the following proteins: - 
Onchocerca volvulus antigen RAL-1 . RAL-1 is highly similar to calreticulin, but possesses a C-terminal domain rich in 
lysine and arginine and lacks acidic residues and is therefore not expected to bind calcium in that region. - Calnexin 
[2]. A calcium-binding protein that interacts with newly synthesized glycoproteins in the endoplasmic reticulum. It seems 
to play a major role in the quality control apparatus of the ER by the retention of incorrectly folded proteins. - Calmegin 
[3] (or calnexin-T), a testis-specific calcium-binding protein highly similar to calnexin. Three signature patterns have 
been developed for this family of proteins. The first two patterns are based on conserved regions in the N<tomain; the 
third pattern corresponds to positions 4 to 16 of the repeated motif in the P-domain. 
Consensus pattern: [KRHN]-x-[DEQNJ-[DEQNK]-x(3)-C-G-G-[AG]-[FY]-[LIVMHKN]-[UVMFY](2)- 
Consensus pattern: [LIVM](2)-F-G-P-D-x-C-[AGI- 
Consensus pattern: [IV]-x-D-x-[DENST]-x(2)-K-P-[DEH]-D-W-[DEN]- 

[ 1] Michalak M., Milner R.E., Bums K., Opas M. Biochem. J. 285:681-692(1992). w 

[ 2) Bergeron J.J.M., Brenner M.B. ( Thomas D.Y., Williams D.B. Trends Biochem. Sci. 19:124-128(1994). 

[ 3] Watanabe D., Yamada K., Nishina Y., Tajima Y., Koshimizu U., Nagata A, Nishimune Y. J Bbl Chem 269* 

7744-7749(1994). 

[0391] 111. Eukaryotic-type carbonic anhydrases signature (carb_anhydrase) 

Carbonic anhydrases (EC 4.2.1.1) (CA) [1,2,3,4] are zinc metalloenzymes which catalyze the reversible hydration of 
carbon dioxide. Eight enzymatic and evolutionary related forms of carbonic anhydrase are currently known to exist in 
vertebrates: three cytosolic isozymes (CA-I. CA-II and CA-III); two membrane-bound forms (CA-IV and CA-V1I); a 
mitochondrial form (CA-V); a secreted salivary form (CA-VI); and a yet uncharacterized isozyme [5).ln the alga 
Chlamydomonas reinhardtii, two CA isozymes have been sequencedt6]. They are periplasmic glycoproteins evolu- 
tionary related to vertebrate CAs. Some bacteria, such as Neisseria gonorrhoeae [7] also have a eukaryotic-type CA. 
CAs contain a single zinc atom bound to three conserved histidine residues. As a signature for CAs, a pattern has 
been developed which includes one of these zinc-binding histidines. Protein D8 from Vaccinia and other poxviruses is 
related to CAs but has lost two of the zinc-binding histidines as well as many otherwise conserved residues. This is 
also true of the N-terminal extracellular domain of some receptor-type tyrosine-protein phosphatases (see 
<PDOC00323>). 

Consensus pattern: S-E.[HN]-x-[UVM]-x(4)-[FYH]-x(2)-E-[LIVMGAJ-H-[LIVMFA](2) [The second H is a zinc ligand]- 
Note: most prokaryotic CA's as well as plant chloroplast CAs belong to another, evolutionary distinct family of proteins 
(see <PDOC00586 
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f 1] Deutsch HF. Int J. Biochem. 19:101-113(1987) 

[ 2] Fernley R.T. Trends Biochem. ScL 13:356-359(1988) 

[ 3] Tashian R.E. BioEssays 10:186-192(1989) 

[ 4) Edwards Y Biochem. Soc. Trans. 18:171-175(1990) 

[ 5] Skaggs LA., Bergenhem N.C.K, Venta RJ., Tashian RE. Gene 1 26*29 1-292M99** 
[0392] 1 1 2. Caseins alpha/beta signature 

rapidly diverging family of proteins iSZir £ T^l ^ beta ^ S9ins - ^^eta caseins are a 

and the signa. sequence. ?he ^^Z^Z^^V^T^ residues 

eight residues of the signal sequence. developed for th,s fam.ly of proteins based upon the last 

Consensus pattern: C-L-[LVJ-A-x-A-{LVF]-A- 

[1J Holt C., Sawyer L Protein Eng. 2:251-259(1988) 

[0393] 113. Catabse signatures 

organisms and in some prokaryotes caZ e s Secut c^^ oH f ? T"" 9 " ,n eukar y°« c 

binds one protoheme IX group A conserved ^^nTs^ves ' S " bUnitS - Each of ,he sub ""^ 

residue has been used as a first suture ImZ^Z • , h ^ Pr ° X ' mal Skte ligand ^ "#» ^ound this 
binding. A conserved histidJhl^r^^^^ 3 TT - ^ * - P3rtiCipates h he ™" 

around this residue has been * ^ ^ -gion 

Consensus pattern: R-{UVMFSTANJ-F-|GASTNP]-Y-xVrASTTfQEHi fY « th» - ■ , 
Consensus pattern: Wx(4 H raj*>^ heme-binding ligand] 

Note: some promote causes ^.D^Uid^ 

M.G. J. Mol Biol. 188:63^72(7^) " **" Fita ' ' MUrthy M R N - Ross ™™ 

[ 3J von Ossowki I., Hausner G., Loewen P.C. J. Mol. Evol. 37:71-76(1993). 

[0394] 114. (chitin binding) Chitin recognition or binding domain signature 

spe™^^s - a common binding 

of chitin subunits. It has been found in IheSSSSIdl^ 1 7" . h ,he rec ° 9nition « bindin 9 

charactered of these lectins are ^Si^iSS^J^"^ °' T^ 9 "™™ 8 pbnl ,ectins - The *« 
N-acetylglucosamine/N-acetylneuramini.^ ^SSwES^S,^ T 1" " n ' nS (WGA " 1, 2 and 3) WGA is a " 
43 amino acid domain. The same type of ^c.u re is old £fh V C ° nSiS,S °' 3 ,OUrfold re P eti,k} " °» ,he 

Plants endochitinases (EC ^l.l^S^i^S^^^iT^? 83 We " 33 3 riCe ' 
the hydrolysis of me beta-1,4lnl^sofN-acetvl^ a 5 22 ^ 2§22) ' Endochrt,nase s are enzymes that catalyze 
against chitin con«ainingfungaTSgr s cK^^ 
attheirN-term^extr^^ 
two copies of the domain. - Hevein [5] awS^ 

•wo wound-induced proteins from potato ^ES^S 5T," - ^ °' ^ " Wn ' and win2 > 
the linear plasmid pGKL1 is composed of three SS^S hSf » h ^ SM |3] ^ ,OXin encoded b V 
activity and inhibits growth of Jn^ ye^J^t^ h , ^ SUbUnit harbore to ^ 

proteohyticallyprocessedfromatergerprrriSto k ^ 3 ' Pha SUbunit " *«* is 

In chitinases, as well as in the ^10 *^33^^^^^ «achitinase(see^Doc00839>). 

quenceandisthereforeatmeN^rminaTofTe^ 
section of the protein. The domain cont^s e^ 

to be invoked in disulfide bonds. Jt^A^^Z 7 T *** *"* 3 " been Shown ' in W <SA. 
figure: + +||M1 ™ arra "9^ent of the four d.sulfide bonds is shown in the following 

xxCgxxxxxxxCx^CCsxxgxCgjocxxxCxxxCx^C I 1 «„iu 

volved in a disulfide bond.'.': position of the pattern. ""+---+ + + C: conserved cysteine in- 
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- Consensus pattern: C-x(4,5)-C-C-S-x(2)-G-x-C-G-x(4HFYW]-C [The five C's are involved in disulfide bonds] 

[ 1J Wright H.T., Sandrasegaram G., Wright C.S. J. MoL Evol. 33:283-294(1991). 
[ 2] Lerner D.R., Raikhel N.V. J. Biol. Chem. 267:11085-11091(1992). 

[ 3J Butler A.R., ODonnel R.W.. Martin V.J., Gooday G.W, Stark M.J.R. Eur. J. Biochem. 199:483-488(1991). 
[0395] 1 1 5. (Chitinase 1 ) Chitinases family 1 9 signatures 

Chitinases (EC 3.2.1.14) [1] are enzymes that catalyze the hydrolysis of thebeta-1 ,4-N-acetyl-D-glucosamine linkages 
in chitin polymers. From the viewpoint of sequence similarity chitinases belong to either family 18 or 1 9 in the classi- 
fication of glycosyl hydrolases [2.E1J. Chitinases of family 19(also known as classes IA or I and IB or II) are enzymes 
from plants that function in the defense against fungal and insect pathogens by destroying their chitin-containing cell 
wall. Class IA/I and IB/II enzymes differ in the presence (IA/1) or absence (IB/II) of a N-terminal chitin-binding domain 
(seethe relevant entry <PDOC00025>). The catalytic domain of these enzymes consist of about 220 to 230 amino acid 
residues. Two highly conserved regions have been selected as signature patterns, the first one is located in the N- 
terminal section and contains one of the six cysteines which are conserved in most, if not all, of these chitinases and 
which is probably involved in a disulfide bond. 

Consensus pattern: C-x(4,5)-F-Y-[ST]-x(3)-[FYHLIVMF]-x-A-x(3)-[YfT-x(2)-F- [GSA] 
Consensus pattern: [LIVM]-|GSA]-F-x-[STAG](2)-[LIVMFY]-W-[FY]-W-[LiVM] 

[ 1) Flach J., Pilet R-E., Jolles P. Experientia 48:701-716(1992). 
[ 2) Henrissat B. Biochem. J. 280:309-316(1991). 

[0396] 1 1 6. chloroa_b-bind 

Chlorophyll A-B binding proteins. Number of members: 211 
[0397] 117. chromo 

The 'chromo' (CHRromatin Organization Modifier) domain [1 to 4] is a conserved region of about 60 amino acids which 
was originally found in Drosophila modifiers of variegation, which are proteins that modify the structure of chromatin 
to the condensed morphology of heterochromatin, a cytologicalty visible condition where gene expression is repressed. 
In protein Polycomb, the chromo domain has been shown to be important for chromatin targeting. Proteins that contains 
a chromo domain seem to fall into three classes: 

a) Proteins which have a N-terminal chromo domain folbwed by a region which is related to but distinct from the 
chromo domain and which has been termed [3] the 'chromo shadow 1 domain. 

b) Proteins with a single chromo domain. 

c) Proteins with paired tandem chromo domains. 

[0398] Currently, this domain has been lound in the following proteins: 
[0399] Class A. 

Drosophila heterochromatin protein Su(var)205 (HP1). 
Human heterochromatin protein HP1 alpha. 
Mammalian modifier 1 and modifier 2. 

Fission yeast swi6, a protein involved in the repression of the silent mating-type loci mat2 and mat3 
[0400] Class B. 

Drosophila protein Polycomb (Pc). 
Mammalian modifier 3, a homo log ol Pc. 

Drosophila protein Su(var)3-9, a suppressor of position-effect variegation. 
Human Mi-2 autoantigen, characterisitic of dermatomyosis. 

Fungal retrotranposon polyproteins: 'skippy* from Fusarium oxysporum, 'grasshopper* and 'MAGGY' from Mag- 

naporthe grisea and CfT-1 from Ctadosporium furvum. 

Fission yeast hypothetical protein SpAC!8G6.02c. 

Caenorhabditis elegans hypothetical protein C29H12.5 

Caenorhabditis elegans hypothetical protein 2K 1236.2. 

Caenorhabditis elegans hypothetical protein T09A5.8. 

[0401] Class C. 



-re 
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Mammalian DNA-binding/helicase proteins CHD-1 to CHD-4 
Yeast protein CHD1. 



[0402] ™ eS ^P-^ 

[ 1] Paro R. Trends Genet. 6:416-421(1990) 

[ 3] Aasland R.. Stewart A.F. Nucleic Acids Res. 23 3168-31 73(1995) 

[ 4] Ktaonn E.V., Zhou S.. Lucchesis J.C. Nucleic Acids Res. 23:4229-4233(1995). 

[0403J 118. citrate_synt 

melal ion cefaclors. condensation. CS can d.rectly form a carbon-carbon bond in the absence of 

identical chains. mrocnondnal matrix, the second is cytoplasmic. Both seem to be dimers of 

- C«««. pa,™,: ^Fy*H3AJ-H.»-|, W ,. 2 WRKT>»(2 W PS| fl [H is an active * ,„„„., 

K53 m 3T*b BMM R °"" ns,0 ° SJ ' "««*wa«2i«2, W 9i^. 

Chaperonin cIpA/B 

Number of members- 39 e 

are listed below. en Shown [1 ' 2 J to bo evolutionary related. These proteins 

- Yeast mitochondrial heat shock protein 78 (gene HSP78) '31 

- Porphyra purpurea chloroplast encoded dpC. 

A and B motifs there are many parts in mese tw^aL ^ 9 ln »° ATP-binding 

selected as signature patterns" The r^^Zt^Z^l u^r^ TM> " ^ haVe bee " 
of the ATP-binding B motif. The second pLrn s^lnZ T^ lT^ ^ ^ r6SidUeS to me C - le ™ al 
motifs. P 6 " 18 l0Ca,ed ,n me second in-between the ATP-binding A and B 

- Consensus pattern: D-{AI]-[SGA]-N-[LIVMFJ(2)-K-[PT]-x-L-x(2)-G 

- Consensus pattern: ^VMFYH>*«^^ 
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[ 1] Gottesman S., Squires C., Pichersky E., Carrington M., Hobbs M., Mattick J.S., Dalrymple B., Kuramitsu H., 
Shiroza T., Foster T, Clark W.P., Ross B., Squires C.L, Maurizi M.R. Proc. Natl. Acad. ScL U.S.A. 87:3513-3517 
(1990). 

[ 2] Parsell D.A., Sanchez Y.. Stitzel J.D., Lindquis! S. Nature 353:270-273(1991). 

{ 3] Leonhardt SA, Fearon K., Danese P.N., Mason T.L Mol. Cell. Biol. 13:6304-6313(1993). 

[0410] 120. cofilin_ADF 
CofilinAropomyosin-type actin-binding proteins 
[1] 

Medline: 97290449 
Structure determination of yeast cofilin. 
Fedorov AA, Lappalainen P t Fedorov EV, Drubin DG, Almo SC; 

Nat Struct Biol 1997;4:366-369. 

[2] 

Medline: 97290450 

Crystal structure ot the actin-binding protein actophorin from Acanthamoeba. 
Leonard SA, Gittis AG, Petrella EC, Pollard TD, Lattman EE; 
Nat Struct Biol 1997;4:369-373. 
[3] 

Medline: 97420794 

F-actin and G-actin binding are uncoupled by mutation of conserved tyrosine residues in maize act in depolymerizing 
factor. 

Jiang CJ, Weeds AG, Khan S, Hussey PJ; 
Proc Natl Acad Sci U S A 1997;94:9973-9978. 
[4] 

Medline: 97357155 

Cofilin promotes rapid actin filament turnover in vivo. 

Lappalainen P, Drubin DG; 
Nature 1997;388:78-82. 

Severs actin filaments and binds to actin monomers. 
Number of members: 44 

[0411] Actin-depolymerizing proteins sever actin filaments (F-actin) and/or bind to actin monomers, or G-actin, thus 
preventing actin-porymerization by sequestering the monomers. The following proteins are evolutionary related and 
belong to a family of low molecular weight (1 37 to 1 66 residues) actin-depolymerizing proteins [1 ,2,3,4]: 

Cofilin from vertebrates, slime mold and yeast. Cofilin binds to F-actin and acts as a pH-dependent actin-depo- 
lymerizing protein. 

Destrin from vertebrates. Destrin binds to G-actin in a pH-independent manner and prevents polymerization. 
Caenorhabditis elegans unc-60. 
Acanthamoeba castetlanii actophorin. 
Plants actin depolymerizing factor (ADF). 

[0412] The most conserved region of these proteins is a twenty amino-acid segment that ends some 30 residues 
from their C-terminal extremity. This segment has been shown [5] to be important for actin-binding. 

- Consensus pattern: P-[DE]-x-[SA]-x-[LIVMT]-[KR]-x-|KR]-M-[LIVM]-[YAHSTA](3)-x(3)-[LIVMFHKR] 

[ 1) Hawkins M., Pope B., Maclver S.K., Weeds A.G. Biochemistry 32:9985-9993(1993). 

[ 2] lida K.. Moriyama K., Matsumoto S., Kawasaki H., Nishida E., Yahara I. Gene 124:115-120(1993). 

[ 3] Quirk S., Maclver S.K., Ampe C, Doberstein S.K., Kaiser D.A., van Damme J., Vandekerckhove J., Pollard T. 

D. Biochemistry 32:8525-8533(1993). 

[ 4] McKim K.S., Matheson C, Marra M.A., Wakarchuk M.F., Bailite D.L Mol. Gen. Genet. 242:346-357(1994). 
[ 5] Moriyama K., Yonezawa N., Sakai H., Yahara I., Nishida E. J. Biol. Chem. 267:7240-7244(1992). 

[0413] 121. (Complex 24kd) Respiratory-chain NADH dehydrogenase 24 Kd subunit signature Respiratory-chain 
NADH dehydrogenase (EC 1.6.5.3) [1,2] (also known as complexl or NADH-ubiquinone oxidoreductase) is an oligo- 
mers enzymatic complex located in the inner mitochondrial membrane which also seems to exist inthe chloroplast and 
in cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide subunits of this bioen- 
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ergetic enzyme complex there is one with a molecular woinht «» oa km r. . v 

sulfur (IP) fragment of the enzyme It see^To S S „ (,n , mammals >- ^ * a component of the iron- 
as aprecursoTform with a t^i^^^l!^^^ ^ * ™ "*"* b 
to [3.4]: - Subunit E of Escherichia co? NADH^^eSi^^T 24 Kd SUbUnB iS h ^ simifc " 

denitrificans NADH-ubiquinone oxio^ educte^ A h^h,^ ^ ^ ' SubmA NQ02 of P*a««t» 

containing two conserv^^ 

as a signature pattern. P Wy ""^ ,he b,ndln9 of ,he 2Fe " 2S has been selected 

• Consensus pattern: D-x(2)-F- [ ST 1 -x(5)-C-L-G-x-C-x( 2 ) fGAJ-P (The two Cs are putative 2Fe-2S ligands] 
[ 1| Ragan C.I. Curr. Top. Bioenerg. 151-36(1987) 

| !J Weiss H. Friedrich T.. Hofhaus G., Preis D. Eur. J. Biochem. 197:563-576(1991) 

! W 3 H m Y i M n Wa,ker ' E - Bi °^ S - "40:105-134(1992) } " 

[ 4] We,dner U. Ge,er S.. Rock A., Friedreh T., Lei, H.. Weiss H. J. L,ol. L. 233:109-122(1993). 

[0414] 122. copper-bind 

Copper binding proteins, plastocyanin/azurin family 
Number of members: 70 

- Cupredoxin (CPC) Irom cucumber peelings [4]. 

- Cusacyanin (basic blue protein; plantacyanin, CBP) from cucumber. 

- Stellacyanin from the Japanese lacquer tree. 

- Umecyanin from horseradish roots. 

' ^uZ^Z^Zt^ P *' en Pr ° ,ein 18 re ' a,ed 10 ^ ^ P""** b « — «o have 

lalCenrG^H?^^ — ".C. * « -em. 259:2822-2325(1934). 

[4, Mann K.. Schae.er w, Thoenes U., Messerschmid, K. Mehrabian Z, Na.bandyan R. FEBS Lett. 314:220-223 
[ 5) Mattar S.. Schar, B., Kent S B.H.. Rodewa.d K., Oesterhel, D., Engelhard M. J. Biol. Chem. 269:14939-14945 
[ 6] Yano T.. Fukumori Y.. Yamanaka T. FEBS Lett. 288:159-162(1991). 
1041 7] 1 23. Chaperonins cpn 1 0 signature 

Chaperons (1.2, are pro.e^s involved in the fotoing of proteins or the assembly of oligomer, protein complexes. 
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They seem to assist other polypeptides in maintaining or assuming conformations which permit their correct assembly 
into oligomeric structures. They are found in abundance in prokaryotes, chloroptasts and mitochondria. Chaperonins 
form oligomeric complexes and are composed of two different types of subunits: a 60 Kd protein, known as cpn60 
(groEL in bacteria) and a 10 Kd protein, known ascpnIO (groES in bacteria).The cpn10 protein binds to cpn60 in the 
presence of MgATP and suppresses the ATPase activity of the latter. Cpn10 is a protein of about 100 amino acid 
residues whose sequence is well conserved in bacteria, vertebrate mitochondriaand plants chloroplast [3,4). Cpn10 
assembles as an heptamer that forms a dome(5). As a signature pattern for cpn10, a region located in the N-terminal 
section of the protein was selected. 

Consensus pattern: [LIVMFY]-x-P-{ILT]-x-[DENHKR]-[LIVMFA](3H^ 

Note: this pattern is found twice in the plant chloroplast protein which consist of the tandem repeat of a cpn10 domain 

[ 1] Ellis R.J., van der Vies S.M. Anna Rev. Biochera 60:321-347(1991). 

[ 2) Zeiista-Ryalls J., Fayet O., Georgopoulos C. Annu. Rev. Microbiol. 45:301-325(1991). 

[ 3] Hartman D.J., Hoogenraad N.J., Condron R., Hoj P.B. Proc. Natl. Acad. Sci. U.S.A. 89:3394-3398(1992). 

[ 4] Bertsch U., Soli J., Seetharam R., Virtanen P.V. Proc. Natl. Acad. Sci. U.S.A. 89:8696-8700(1992). 

[ 5] Hunt J.F., Weaver A.J., Landry S.J., Gierasch L, Deisenhofer J. Nature 379:37-45(1996). 

[041 8] 1 24. Chaperonins cpn60 signature (cpn60_TCP1 ) 

Chaperonins [1 ,2] are proteins involved in the folding of proteins or the assembly of oligomeric protein complexes. 
Their role seems to be to assist other polypeptides to maintain or assume conformations which permit their correct 
assembly into oligomeric structures. They are found in abundance in prokaryotes, chloroptasts and mitochondria. Chap- 
eronins form oligomeric complexes and are composed of two different types of subunrts: a 60 Kd protein, known as 
cpn60 (groEL in bacteria) and a 10 Kd protein, known as cpn10 (groES in bacteria).The cpn60 protein shows weak 
ATPase activity and is a highly conserved protein of about 550 to 580 amino acid residues which has been described 
by different names in different species: - Escherichia coli groEL protein, which is essential for the growth of the bacteria 
and the assembly of several bacteriophages. - Cyanobacterial groEL analogues. - Mycobacterium tuberculosis and 
leprae 65 Kd antigen, Coxiella bumetti heat shock protein B (gene htpB), Rickettsia tsutsugamushi major antigen 58, 
and Chlamydial 57 Kd hypersensitivity antigen (gene hypB). - Chloroplast RuBisCO subunit binding-protein alpha and 
beta chains, which bind ribulose bisphosphate carboxylase small and large subunits and are implicated in the assembly 
of the enzyme oligomer. - Mammalian mitochondrial matrix protein P1 (mitonin or P60). - Yeast HSP60 protein, a 
mitochondrial assembly factor. As a signature pattern for these proteins, a rather well-conserved region of twelve 
residues, located in the last third of the cpn60sequence was chosen. 
Consensus pattern: A-[AS]-x-[DEQ]-E-x(4)-G-G-[GA|- 

[ 1] Ellis R.J., van der Vies S.M. Annu. Rev. Biochem. 60:321-347(1991). 

[ 2] Zeilsta-Ryalls J., Fayet O., Georgopoulos C. Annu. Rev. Microbiol. 45:301-325(1991). 

[0419] Chaperonins TCP-1 signatures (cpn60_TCP1 ) 

The TCP-1 protein [1,2) (Tailless Complex Polypeptide 1) was first identified in mice where it is especially abundant in 
testis but present in all cell types. It has since been found and characterized in many other mammalian species, in 
Drosophila and in yeast. TCP-1 is a highly conserved protein of about 60 Kd (556 to 560 residues) which participates 
in a hetero-oligomeric900 Kd double-torus shaped particle [3] with 6 to 8 other different subunits. These subunits, the 
chaperonin containing TCP-1 (CCT) subunit beta, gamma,delta, epsilon, zeta and eta are evolutionary related to TCP- 
1 itself [4,5].The CCT is known to act as a molecular chaperone for tubulin, actin and probably some other proteins. 
[0420] The CCT subunits are highly related to archebacterial counterparts: - TF55 and TF56 [6], a molecular chap- 
erone from Sulfolobus shibatae. TF55 has ATPase activity, is known to bind unfolded polypeptides and forms a oligo- 
meric complex of two stacked nine-membered rings. - Thermosome [7], from Thermoplasma acidophilum. The ther- 
mosome is composed of two subunrts (alpha and beta) and also seems to be a chaperone with ATPase activity. It 
forms an oligomeric complex of eight-membered rings. The TCP-1 family of proteins are weakly, but significantly [8], 
related to thecpn60/groEL chaperonin family (see <PDOC00268 >VAs signature patterns of this family of chaperonins, 
three conserved regions located in the N-terminal domain were chosen. 
Consensus pattern: [REELHST]-x-[LMFYJ-G-P-x-[GSA)-x-x-K-[LIVMF](2)- 

Consensus pattern: [LIVM]-fTSl-[NK]-D-[GA]-[AVNHK]-[TAV]-[LIVM](2)-x(2)-[LIVMJ-x-[LIVM]-x-{SNHl-[PQHJ- 
Consensus pattern: Q-[DEKj-x-x-[LIVMGTAHGA]-D-G-T- 

[ 1) Ellis J. Nature 358:191-192(1992). 

( 2] Nelson R.J., Craig E.A. Curr. Biol. 2:487-489(1992). 

[ 3] Lewis V.A., Hynes G.M., Zheng D., Saibil H, Willison K.R. Nature 358:249-252(1992). 
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4 Kubota H Hynes G., Came A., Ashworth A.. Willison KR. Curr. Biol 489-99(1 994, 

3 J£S r eS96m E - W& " J S - Hartl F U - A.L Nature 354-490-493(1 991 > 

[7]WaldmannT..LupasA. > KellermannJ. > PetersJ BaumeisterW Biol r^tT « , 
[ 8] Hemmingsen S.M. Nature 357:650-650(1992). baUm6 ' SterW - Hoppe-Seyler 376:119-126(1995). 

[0421] 125. cyclin (Cyclins) 

The cyc.ins include an interna, duplication, which is reteted to tha, found in TFIIB and the RB protein. 

Medline: 94203808 

Evidence for a protein domain superfamily shared by the cyclins 
TFIIB and RBr*p107. 
Gibson TJ, Thompson JD, Blocker A, Kouzarides T; 
Nucleic Acids Res 1 994 ;22: 946-952 
[2] 

Medline: 96164440 
The crystal structure of cyclin A 
Brown NR Noble MEM, Endicott JA, Garman EF, Wakatsuki 3 
Mrtchell E, Flasmussen B, Hunt T, Johnson LN* 
Structure. 1 995;3: 1 235-1 247. 
Complex of cyclin and cyclin dependant kinase 
[3] 

Medline: 96313126 

St Rutrii a ?« 0f Cy D C ^ ependant kinase aM °" ^ phosphorylation. 
Russo AA, Jeffrey PD, Pavletich NP; 

Nat Struct Biol. 1996;3:696-700. 

Cyclins regulate cyclin dependant kinases (CDKs). 

The most divergent prosite members have been included Swiss P22674 th« i i^i-hm* . 

noise and may be related but has not been included bW,ss P22674 tne Uracl-DNA glycosylase 2 is the highest 
Number of members: 189 

S, ^J^S^J^^ flThe 3 ^ ? in p COn,r °" in9 

main groups of cyclins: **' *° m Ma,ura »« 1 P^ing Factor (MPF). TTiere are two 

E *^'". h0m< ^'» satol » e ''''>'''«''>te'P«W"»alm»il4] 

SK-Tsr.sr sees* — , ^ - ,te -» - - ■. «- 

[ 1J Nurse P. Nature 344:503-508(1990). 

[ 2] Norbury C, Nurse R Curr. Biol. 1:23-24(1991) 

[ 3] Lew D.J., Reed S.I. Trends Cell Biol. 2:77-81(1992) 

[ 4] Nicholas J., Cameron K.R., Honess R.W. Nature 355:362-365(1 992). 

[0426] 1 26. Cystatin domain 
.omisfan^leno?^ 
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Type 1 cystatins (or stefins), molecules of about 100 amino acid residues with neither disulfide bonds nor carbo- 
hydrate groups. 

Type 2 cystatins, molecules of about 115 amino acid residues which contain one or two disulfide loops near their 
C-terminus. 

s - Kininogens, which are multifunctional plasma glycoproteins. 

[0428] They are the precursor of the active peptide bradykinin and play a role in blood coagulation by helping to 
position optimally prekallikrein and factor XI next to factor XII. They are also inhibitors of cysteine proteases. Structurally, 
kininogens are made of three contiguous type-2 cystatin domains, followed by an additional domain (of variable length) 
10 which contains the sequence of bradykinin. The first of the three cystatin domains seems to have tost its inhibitory 
activity. 

[0429] In all these inhibitors, there is a conserved region of five residues which has been proposed to be important 
for the binding to the cysteine proteases. The consensus pattern starts one residue before this conserved region. 

is - Consensus pattern: [GSTEQKRVJ-Q- [LI\H14VAF>[SAGQ]-G-x4LIVMNK]-x(2HU\^4FYl-x-[LIVMFYAHDEN. 
QKRHSIVJ 

[1] Barrett A.J. Trends Biochem. Sci. 12:193-196(1987). 
[2] Rawlings N.D., Barrett A.J. J. Mol. EvoL 30:60-71(1990). 
20 [3] Turk V, Bode W. FEBS Lett. 285:213-219(1991). 

[4] Lustigman S., Brotman B., Huima T., Prince A.M. Mol. Biochem. Parasitol. 45:65-76(1991). 

[0430] 127. cytochromes (Cytochrome c) 
The Ram entry does not include all proshe members. 
2S The cytochrome 556 and cytochrome c* families are not included. 
Number of members: 259 

[0431] In proteins belonging to cytochrome c family [1], the heme group is covalently attached by thioether bonds to 
two conserved cysteine residues. The consensus sequence for this site is Cys-X-X-Cys-His and the histidine residue 
is one of the two axial ligands of the heme iron. This arrangement is shared by all proteins known to belong to cyto- 
30 chrome c family, which presently includes cytochromes c, c\ d to c6, c550 to c556, cc3/Hmc, cytochrome f and reaction 
center cytochrome c. 

- Consensus pattern: C-{CPWHF}-{CPWR}-C-H-{CFYW} 

3S [0432] [ 1] Mathews F.S. Prog. Biophys. Mol. Biol. 45:1-56(1985). 

[0433] 128. (DAGKa) Diacylglycerol kinase accessory domain (presumed) 

[0434] Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. This domain is assumed 
to be an accessory domain: its function is unknown. 

[0435] [1] Sakane F, Yamada K, Kanoh H, Yokoyama C, Tanabe T, Nature 1990; 344: 345-348. [2] Sakane F, Imai S, 
40 Kai M, Wada I, Kanoh H, J Biol Chem 1996;271 :8394-8401 .[3] Schaap D, de Widt J, van der Wal J, Vandekerckhove 
J, van, Damme J, Gussow D, Ploegh HL, van Blitterswijk WJ, van der. Bend RL, FEBS Lett 1990;275:151-158. [4] 
Kanoh H, Yamada K, Sakane F t Trends Biochem Sci 1990;15:47-50. 
[0436] 129. (DAGKc) Diacylglycerol kinase catalytic domain (presumed) 

[0437] Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. The catalytic domain 

45 is assumed from the finding of bacterial homologues. 

[0438] [1] Sakane F, Yamada K, Kanoh H, Yokoyama C, Tanabe T, Nature 1990;344:345-348. [2] Sakane F, Imai S, 
Kai M, Wada I, Kanoh H, J Biol Chem 1996;271 :8394-8401 . [3] Schaap D, de Widt J, van der Wal J, Vandekerckhove 
J, van, Damme J, Gussow D, Ploegh HL, van Blitterswijk WJ, van der, Bend RL, FEBS Lett 1990;275:151-158. [4] 
Kanoh H, Yamada K, Sakane F, Trends Biochem Sci 1990;15:47-50 

50 [0439] 1 30. D-amino acid oxidases signature(DAO) 

[0440] D-amino acid oxidase (EC 1.4.3.3 ) (DAMOX or DAO) is an FAD flavoenzyme that catalyzes the oxidation of 
neutral and basic D-amino acids into their corresponding keto acids. DAOs have been characterized and sequenced 
in fungi and vertebrates where they are known to be located in the peroxisomes. D-aspartate oxidase (EC 1.4.3.1 ) 
(DASOX) [1] is an enzyme, structurally related to DAO, which catalyzes the same reaction but is active only toward 

55 dicarboxylic D-amino acids. In DAO, a conserved histidine has been shown (2] to be important for the enzyme's catalytic 
activity. The conserved region around this residue has been deve toped as a signature pattern tor these enzymes. 
[0441] Consensus partem: [LIVM](2)-H-[NHA]-Y-G-x-[GSA](2)-x-G-x(5)-G-x-A [H is a probable active site residue^ 
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! « I? 5 * \. C °^ ni R> TedeSCh ' G - Sim0nic T ' Ronchi S - J - ** ch <™- 267:11865-11871(1992) 

[ 2] M,yano M.. Fuku, K.. Vfetanabe F., Takahashi S.. Tada M., Kanashiro M., Miyake Y. ] 09:1 71-1 77 

[0442] 131. DEAD and DEAH box families ATP-dependent helicases signatures 

ITS* °' ,! Ukary ° tiC r d P roka, y o,ic P roteins have been characterized [1.2.3] on the basis of their structural simi- 
tar,* They all seem to be mvolved in ATP-dependent. nucleic-acid unwinding. Proteins currently iLnto bltaT 

ar ! : J,' n ^ 1i0n ^ ^ F ° Und h 8Ukar y° ,es - ,his P ro,ei " * * subuni, S a?at^ Tmolecuter 
^and^^^ 

ATPase and DNA-helicase activities in vitro. I, is Jfced^ P6B 

cr^« S , £ 10358 ' nVOlVed n SXCision re P air of DNA ^ged by UV light bulky adducts or 

proo^S ,™ ,2VM ^ yL^tp£ v )( , . Ch " ,mP ° rtam ,0f chromosome transmission and normal cell cycle 
5SZ^S^2^Ln^ h ^ e,ira, P rote " YKL078w - - Caenorhabdrtis e.egans hypotheL. 

[0443] Consensus pattern: [UVMF](2)-D-E-A-D-[RKEM]-x-[LIVMFYGSTN 
Consensus pattern: [GSAH]-x-[LIVMF](3)-D-E-[ALIV]-H-[NECR] 

en ^"^ 

[ 1) Schmid S.R., Under R Mol. Microbiol. 6 283-292(1992) 
[2lLWerP..LaskoP..AshburnerM..LeroyP.. N ie b enP.J..N«hiK..Schnier^ 

[ 3] Wassarman D.A.. Steitz J.A. Nature 349:463-464(1991) 

J 4) Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata) 

[ 5] Harosh I., Deschavanne P. Nucleic Acids Res. 19:6331-6331(1991) 

[ 6] Koonin E.V., Senkevich T.6. J. Gen. Virol. 73:989-993(1992). 

!SS l?n ( KT-T haSe) 3.^ihydroxy-2-bu.anone 4-phosphate synthase 

S^,i, V ^ 4 ^"« ,h8te is Synthesized from ribulose 5-phosphate and serves as the bio- 

33? SST? K S * rib ° ,,aVin SOme,imeS ,OUnd aS a bifu ™*°" al -zyme with G T cvcloTvdrn" 

[uwj 1 33. (DHDPS) Dihydrodipicolinate synthetase signatures 

D,hydrodipicolina te synthetase (EC 4^52) (DHDPS) (,| ca,a.yzes. in higher plants chtoroplas, and in many bacteria 
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(gene dapA), the first reaction specific to the biosynthesis of lysine and of diaminopimelate. DHDPS is responsible for 
the condensation of aspartate semialdehyde and pyruvate by aping-pong mechanism in which pyruvate first binds to 
the enzyme by forming a Sen iff -base with a lysine residue. Three other proteins are structurally related to DHDPS and 
probably also act via a similar catalytic mechanism: • Escherichia coli N-acetylneuraminate lyase (EC 4.1.3.3 ) (gene 
nanA), which catalyzes the condensation of N-acetyl-D-mannosamine and pyruvate to form N-acetylneuraminate. - 
Rhizobium meliloti protein mosA [3], which is involved in the biosynthesis of the rhizopine 3-o-methyl-scyllo-inosamine. 
- Escherichia coli hypothetical protein yjhH. Two signature patterns for these enzymes were developed . The first one 
is centered on highly conserved region in the N-temninal pari of these proteins. The second signature contains a lysine 
residue which has been shown, in Escherichia coli dapA [2], to be the one that forms a Schiff-base with the substrate. 
[0448] Consensus pattern: [GSA]4LIVM]-[LIVMFY]-x(2)-G-[ST]-[TG]-G-E-[GASNF]-x(6)-[EQ]- 
Consensus pattern: Y-[DNSHLIVMFA]-P-x(2HSTl-x(3HLIVMG]-x(13,14)-[LIVM]- x-[SGA]-[LIVMF]-K-[DEQAFJ- 
[STAC] [K is involved in Schiff-base tormation]- 

[ 11 KanekoT, Hashimoto T, Kumpaisal R, Yamada Y. J. Biol. Chem. 265:17451-17455(1990). 

[ 2] Laber B., Gomis-Rueth F.-X, Romao M.J., Huber R. Biochem. J. 288:691-695(1992). 

[ 31 Murphy P.J., Trenz S.R, Grzemski W., de Bruijn FJ., Schell J. J. Bacteriol. 175:5193-5204 (1993). 

[0449] 134. (DHOdehase) Dihydroorotate dehydrogenase signatures 

Dihydroorotate dehydrogenase (EC 1.3.3.1) (DHOdehase) catalyzes the fourth step in the de novo biosynthesis of 
pyrimidine, the conversion of dihydroorotate into orotate. DHOdehase is a ubiquitous FAD flavoprotein. In bacteria 
(gene pyrD), DHOdease is located on the inner side of the cytosolic membrane. In some yeasts, such as in Saccha- 
romyces cerevisiae (gene URA1), it is a cytosolic protein while in other eukaryotes it is found in the mitochondria [1]. 
The sequence of DHOdease is rather well conserved and two signature patterns were developed specific to this en- 
zyme. The first corresponds to a region in the N-terminal section of the enzyme while the second is located in the C- 
terminal section and seems to be part of the FAD-binding domain. 

Consensus pattern[GS]-x(4HGK]4GSTA)-ILIWSTAJ-(GT]-x(3)-[NQR]-x-G-[NHYJ-x(2)-P-[RT] 

[0450] Consensus pattem[LIVMl(2)-[GSA]-x-G-G-[IV)-x-tSTGDN]-x(3)-[ACV].x(6).G-A 

[0451] [ 1] Nagy M., Lacroute F., Thomas D. Proc. Natl. Acad. Sci. U.S.A. 89:8966-8970(1992). 

[0452] 135. (DMRL_synthase) 6 t 7-dimethyl-8-ribityllumazine synthase 

[0453] 1 36. (DNA_methylase) C-5 cytosine-specific DNA methylases signatures 

C-5 cytosine-specific DNA methylases (EC 2.1.1.73 ) (C5 Mtase) are enzymes that specifically methylate the C-5 carbon 
of cytosines in DNA (1,2,3], Such enzymes are found in the proteins described below. - As a component of type II 
restriction-modification systems in prokaryotes and some bacteriophages. Such enzymes recognize a specific DNA 
sequence where they methylate a cytosine. In doing so, they protect DNA from cleavage by type II restriction enzymes 
that recognize the same sequence. The sequences of a large number of type II C-5 Mtases are known. - In vertebrates, 
there are a number of C-5 Mtases that methylate CpG din ucleot ides. The sequence of the mammalian enzyme is 
known.C-5 Mtases share a number of short conserved regions. Two of them were selected. The first is centered around 
a conserved Pro-Cys dipeptide in which the cysteine has been shown [4] to be involved in the catalytic mechanism; it 
appears to form a covalent intermediate with the C6 position of cytosine. The second region is located at the C -terminal 
extremity in type-ll enzymes 

[0454] Consensus pattern: [DENKS]-x-[FLIV)-x(2HGSTC]-x-P-C-x(2)-[FYWLIM]-S [C is the active site residue]- 
Consensus pattern: (RKQGTF]-x(2)-G-N-[STAGHLlVMFJ-x(3)-[LIVMT]-x(3)-[LlVMl-x(3)-[LIVM]- 

[ 1) Posfai J., Bhagwat AS., Roberts R.J. Gene 74:261-263(1988). 

[ 2] Kumar S., Cheng X., Klimasauskas S., Mi S., Posfai J., Roberts R.J., Wilson G.G. Nucleic Acids Res. 22:1-10 
(1994). 

[ 3) Lauster R., Trautner T.A., Noyer-Weidner M. J. Mol. Biol. 206:305-312(1989). 

[ 4] Chen L, McMillan A.M., Chang W., Ezak-Nipkay K., Lane W.S., Verdine G.L Biochemistry 30:11018-11025 
(1991). 

[0455] 1 37. (DNAphotolyase) DNA photolyases class 2 signatures 

Deoxyribodipyrimidine photolyase (EC 4.1.99.3 ) (DNA photolyase) [1,2] is a DNArepair enzyme. It binds to UV-dam- 
aged DNA containing pyrimidine dinners and, upon absorbing a near-UV photon (300 to 500 nm), breaks the cyclobutane 
ring joining the two pyrimidines of the dimer. DNA photolyase is an enzyme that requires two choromophore-cof actors 
for its activity: a reduced FADH2 and either 5,10-methenyltetrahydrofolate (5,10-MTFH) or an oxidized 8-hydroxy- 
5-deazaflavin (8-HDF) derivative (F420). The folate or deazaflavin chromophore appears to function as an antenna, 
while the FADH2 chromophore is thought to be responsible for electron transfer. On the basis of sequence similarities 
[3] DNA photolyases can be grouped into two classes. The second class contains enzymes from Myxococcus xanthus, 
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cially in the C-terminal part. Two of these raS^^SST 9 ^ CbSS 2 DN Aphotolyases, espe- 

Consensus pattern: F,^UvS-& 
■ Consensus pattern: G-x-H-D-x(2)- W -x-E.R-x4^ 

I sfe^^sr &K 12:259 - 261(19B7 >- 

( [ 3] Yasui EKe, /.P.M.. Yasuhira S.. Yajima H.. Kobayashi T, TaKao ML. Ota- A EMBO jl ,3:3,43-6,5, 

[0456] (DNApho,olyase2) DNA photolyases class 1 signatures 
Deoxyribodipyrimidine photofyase (EC 4 1 99 Si /nwA n^^.., 

aged DNA containing pyrimidL di^JSSJ^SSSTSJVS on* ^ " binds ,0 

tane ring joining the two pyridines of the d£ tTpZl^Z^^ * J° 500 nm)> br6akS ,he c V ctobu " 
cofactors for its activity: a reduced FADH2 and Xr 5 l^mS f.f?T that feqUires ^ '"O'ornophore- 
droxy-5-deazaflavin (8-HDF) derivathre (F420) T^e folat^^r d^^afi ^' ^ ^^^J or a" oxidized 8-hy- 

tenna. whi.e the FADH2 ch/omophore ^^^ZS^^fTT^ * ' UnC,i0n 38 an * 

similarities^] DNA photolyases can be grouped into^vra classes °Tha firet rf °" ba * of sec > uenc ° 

and Gram-positive bacteria, the halophL aSSi^StSJ^^ff!"^ en2ymes ,rom G'am-negative 
bind either-5.tO.MTHF (E.coli. fungfetc ori^S hT.L ^'7' <*- 1 enzymes 

cytochromes 1 (CRY1 ) and 2 (CW2) whfc^n are blui Hhal ° b,um >- 7h,s ,a "% also includes Arabidopsis 

pression. Thereare a "iber ofLse^^^ b,Ue ^ne'ex- 

C^ermina. par,. Two of these regions were JL^S^JZZ "P"** h the 

[0457] Consensus partem: T-G-x-P-IUVMK2)-D-A-x-M-[RA]-x-fUVMl- 
Consensus pattern. HW*4*U^^ W . [KRQ] . 

|3) Vasui A.. EKer AP.M.. Vasuhira S.. Y aj ima H., Kobayashi T. TaKao M.. OiKaw. A. EMBO , ,3 ;6 ,43-e,5, 

[ 4] Lin a, Ahmad M., Cashmore A.R. Plant J. 10:893-902(1996). 

[0458] 138. (DNA _j>ol_A) 

DNA polymerase family A signature 

Replicative DNA polymerases (EC 2 7 7 7\ am tho i«„ 

»**. .Kte, . small RNA Ji"' JS^^ZTf * «» "I*— « DNA ^ 

ol Mm. similantl.s „ „„ mb . r „ DNA P ™iZ.*™^ "°? S '""" Sl * M a DNA **' °» "» to» 
DNA polymerase family A. ^ e P°b'nwras^t^t te^g^ ft^^,^ 0 ^ 8 ^^^^^''^'^ unc ' er designalion ol 

' Bacteriophage sp01 polymerase. 

- Bacteriophage sp02 polymerase. 

- Bacteriophage T5 polymerase. 

- Bacteriophage T7 polymerase. 

- Mycobacteriophage L5 polymerase. 

• Yeast mitochondrial polymerase gamma (gene MIP1 ). 

substrates; it contains a conserved iyrosin ^ch haS 

a conserved rysine. also par, of this SSZSl^ZSS * " f '° be in ,he 

was used as a signature for this famiry o, DNA poSlef ' 9 ™ S -gion 

[ 11 Defcrue M., Pcch O.. Todro N.. Moras D., Argos P. P f0tein Eng . 3:46,-46 7{ ,990). 



EP 1 033 405 A2 



[ 2) Ito J., Braithwaite D.K. Nucleic Acids Res. 19:4045-4057(1991). 
[ 3] Braithwaite D.K., Ito J. Nucleic Acids Res. 21:787-802(1993). 

[0461] 139. DNA _pol_viral_C 

DNA polymerase (viral) C-terminal domain 

Number of members: 128 

[0462] 140. (DNAJopoisoll) 

DNA topoisomerase II signature 

DNA topoisomerase I (EC 5.99.1. 2)*[1,2,3,4,E1] is one of the two types of enzyme that catalyze the interconversion 
of topological DNA isomers. Type II topoisomerases are ATP-dependent and act by passing a DNA segment through 
a transient double-strand break. Topoisomerase II is found in phages, archaebacteria, prokaryotes, eukaryotes, and 
in African Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three subunits (the product of 
genes 39, 52 and 60). In prokaryotes and in archaebacteria the enzyme, known as DNA gyrase, consists of two subunits 
(genes gyrA and gyrB [E2]). In some bacteria, a second type II topoisomerase has been identified; it is known as 
topoisomerase IV and is required for chromosome segregation, it also consists of two subunits (genes parC and parE). 
In eukaryotes, type II topoisomerase is a homodimer. 

[0463] There are many regions of sequence homology between the different subtypes of topoisomerase II. The 
relation between the different subunits is shown in the following representation: 



< About- 1 400-residues > 

[ Protein 39-* ][ — Protein 52 — ] Phage T4 

[ gy r B *- — )[ gy r A ] Prokaryote II 

Archaebacteria 

[ parE * ][ parD ] Prokaryote IV 

[ * ] Eukaryote and 

ASF 

'*': Position of the pattern. 

[0464] As a signature pattern for this family of proteins, a region that contains a highly conserved pentapeptide was 
selected. The pattern is located in gyrB, in parE, and in protein 39 of phage T4 topoisomerase. 
[0465] Consensus pattem[LlVMA]-x-E-G-[DN]-S-A-x-[STAGl Sequences known to belong to this class detected by 
the pattern ALL 

[ 1] Sternglanz R. Curr. Opin. Cell Biol. 1:533-535(1990). 

[ 2] Bjornsti M.-A. Curr. Opin. Struct. Biol. 1:99-103(1991). 

[ 3] Sharma A M Mondragon A. Curr. Opin. Struct. Biol. 5:39-47(1995). 

[ 4) Roca J. Trends Biochem. Sci. 20:156-160(1995). 

[0466] 141 (DSPc) Tyrosine specific protein phosphatases signature and profiles 

Tyrosine specific protein phosphatases (EC 3.1.3.48 ) (PTPase) [1 to 5J are enzymes that catalyze the removal of a 
phosphate group attached to a tyrosine residue. These enzymes are very important in the control of cell growth, pro- 
liferation, differentiation and transformation. Multiple forms of PTPase have been characterized and can be classified 
into two categories: soluble PTPases and transmembrane receptor proteins that contain PTPase domain(s). The cur- 
rently known PTPases are listed below: Soluble PTPases. - PTPN1 (PTP-1 B). - PTPN2 (T-cell PTPase; TC-PTP). - 
PTPN3 (H1) and PTPN4 (MEG), enzymes that contain an N-terminal band 4. 1- like domain (see <PDOC00566 >) and 
could act at junctions between the membrane and cytoskeleton. - PTPN5 (STEP). - PTPN6 (PTP-1 C; HCP; SHP) and 
PTPN11 (PTP-2C; SH-PTP3; Syp), enzymes which contain two copies of the SH2 domain at its N-terminal extremity. 
The Drosophila protein corkscrew (gene csw) also belongs to this subgroup. - PTPN7 (LC-PTP; Hematopoietic protein- 
tyrosine phosphatase; HePTP). - PTPN8 (702-PEP). - PTPN9 (MEG2). - PTPN12 (PTP-G1; PTP-P19). - Yeast PTP1. 



oc 



EP 1 033 405 A2 



of cdc2,Yeast CDC 14 which may be involved in ch^o^TI PVP ^ * con,rlbu,es •» ^phosphorylation 
yopH).-Au.oa ra pna C ^ 

MAP kinase phosphatase-1- MKP-1V which deon^Tf , , eDual ^ec«ficrty PTPases. -DUSP1 (PTPN10- 

DUSP3 (VHR). - DUSP4 (HVH2) - DUSPSfSST n^ol ^ ERK2 °" bolh Thr T V r r «idues - 
a PTPase that dephosphoryiates M^^t^FUS Yeast^Wi ^ DUSP7 (P / Sl2: MKp -X) - Yeast MSGS, 
Phosphatase. Receptor PTPases. Structurally all kno wn SIITptp V " US Hl PTPaSo; a dual 

cellular domain, foltowed by a transmembrane' 2L^7c2L2^^^ " P °* 3 ^ 
ceptor PTPases contain fibronectintype III (FnS repeat im^Z^?^ ^T* 8 ^ * the re- 

anhydrase-like domains in their extracellular region Zc^ZZ? d0ma,nS ' ^ domah » orcarbo n* 

PAse domain. The first seems to have ^SS^S^J^ l "» «**• of *• p T- 

specificity of the first. In these don^TS^J^?!"^ " '"^^ bUt 8eem » t0 affect ^rate 
important, residues are not In the WbCTJ S, uTeS k ^ ^ f™"™"* 

cellular Intracellular ^SrSSSSSS^ 1 T ""^ PTPaSes b shown: ^ 

2 0 0 2Leukccyte antigen related (LAR) 3 8 OoT^^iS^SfXSS^ (LCA) (CD45) ° 
(LRP) 0 00 0 2PTP-beta 016 0 01 PTP-gamma 0 ? e£ >7 0 0 2 T 2 2 ° ° 2PTP *' pha 
0 1 2PTP-mu 14 01 2PTP-zeta 0110 2PTPa Sa cklnZZ f . u p TP-eps.lon 00 0 0 2PTP-kap pa 1 4 

cysteines, me second one has been sSo^ to ^ ^ 

residues in rts immediate vanity have ^^^XhS? f"^ 6 "™. a of conserved 

derived centered on the active site cysteine SrTm Z th T . l A pat,em ,or PTPase d °"«ins was 

domain and is no, specie to -^ST^^dSTC?"^ T?^ *" ** °" e Spans 918 «**••• 
to the PTP subfamily " ^ pr °" le ' S Specrf,c to ^l-specificity PTPases and the third one 

[0467, Consensus pattern: tLI V M F] .H. C .x(2H 3 -x( 3 HSTCHSTA G P ] -x- [ UVMFY ] [C s the acWe s«e residue]- 

!2 C u ha f )nneau H - Tonks N-K. Science 253:401-406(1991) 

2 Chafconneau H., Tonks N.K. Annu. Rev. Cell Biol. 8:463^93(1992) 
[ 3] Trowbndge I S. J. Biol. Chem. 266:23517-23520(1991) 

1 2 u 0 "*! 8 N T K ' Charbonneau H - Trends Biochem. Sci. 14:497-500(1989) 
[ 5] Hunter T. Cell 58:101 3-iOifi( iQflO| 1 3aM '- 

!Snn U2 " (Dl f 10) Uncharac,eri2 <* P^ein family UPF0076 signature 

may inhibit an initiation stage of cell-free protein sv^I £T £ ^ S °' Uble pr ° ,ein < PSP1 >- PSp 1 & 

mosome V hypothetical prcL YEROsTc - YeTst ch o^Lm^X h ^TT*" ^ HRSP12 " ' YeaSl Chfo " 
egans hypothetical protein C23G10.2 -LhSS^^^T^^^ Y,L051a " Cae n^abditis el- 
tein yhaR. - Escherichia coli ^^fZ^y^Zi^ ? ^ " ESCh8riChia C0,i pro- 
- Escherichia coli hypotheti J^roteirty^ Haemophilus influenzae protein. 

hyr*>.hetk:a. P ro«e^ 

xanthus dfrA. - Synechocystis strain PCC 6803 K25 vSl^ZV* " ^ 0C f= cus lactis al °R- " Myxococcus 
plasmid hypothetical protein y4sK. - PyrooocoMkc^ L^ ? ■' ^ a0b,urn s,rain NGR234 symbiotic 
around 15 Kd whose sequence is hiZ^^lT^T ^ ^ PH0854 are sma.. proteins of 
terminal part of these proteins was selected 9 Pattem ' 3 W8 " COnSe,ved "n the C- 

10469, Consensus pa«em: [PA H ASTPV ] -R- [SA cVF ] -x- [ L.VMFY,-x(2 H GSAKR> x-^xfS.SHLIVMJ-E-IM.,- 
f 1) Bairoch A. Unpublished observations (1995) 

tSIOka T, Tsuii K. Noda C, Sakai K., Hong Y.-M.. Suzuki ... Munoz S.. Natori Y. J. Biol. Chem ?7 Q,^n.^ 

[0470] 143. (DUF3)Domain of Unknown Function 3 

StT T4?TDUF^T n9 i eXC,USiVe ^ ^ eUbaC,efia Unk " OWn func «°" 
Liw/ij 144. (DUF6) Integral membrane protein 

Siel^^ 

[0473] 145. (DUF7) Integral membrane protein 

[0474] This famity includes many hypothet.a, membrane proteins of unknown functfcn. Sw^l4502 has been 
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implicated in resistance to ethidium bromide. 

[04751 146- (DapB) Dihydrodiptcotinate reductase signature 

Dihydrodipicolinate reductase (EC 1.3.1.26 ) catalyzes the second step in the biosynthesis of diaminopimelic acid and 
lysine, the NAD or NADP-dependent reduction of 2,3<iihydrodipicolinate into 2,3,4, 5-tetrahydrodipicolinate. This en- 
zyme is present in bacteria (gene dapB) and higher plants. As a signature pattern the best conserved region in this 
enzyme was selected. It is located in the central section and is part of the substrate-binding region (1). 
[0476] Consensus pattern: E-[iV]-x-E-x-H-x(3)-K-x-D-x-P-S-G-T-A- 
[0477] [ 11 Scapin G.. Blanchard J.S., Sacchettini J.C. Biochemistry 34:3502-3512(1995). 
[0478] 147. DedA family 

[0479] This family combines the DedA related proteins and YIAN/YGIK family. Members of this family are not func- 
tionally characterised. These proteins contain multiple predicted transmembrane regions. 
[0480] 148. DegT/DnrJ/EryC1/StrS family 

[0481] The members of this family exhibit some characteristics of the sensor protein of two-component signal trans- 
duction systems, however none of the members show any sequence similarity to these protein kinases. The members 
of this family do have the typical helix-turn-helix motif of DNA binding proteins. 
[0482] {1] Stutzman-Engwall KJ, Otten SL, Hutchinson CR, J Bacteriol 1992;174:144-154. 
[0483] 149. (Desaturase) Fatty acid desaturases signatures 

Fatty acid desaturases (EC 1.14.99.-) are enzymes that catalyze the insertion of a double bond at the delta position 
of fatty acids. There seems to be two distinct families of fatty acid desaturases which do not seem to be evolutionary 
related Family 1 is composed of: - Stearoyl-CoA desaturase (SCD) (EC 1.14.99.5) [1 ]. SCD is a key regulatory enzyme 
of unsaturated fatty acid biosynthesis. SCD introduces a cis double bond at the delta(9) position of fatty acyl-CoA's 
such as palmitoleoyl- and oleoyl-CoA. SCD is a membrane-bound enzyme that is thought to function as a part of a 
muttienzyme complex in the endoplasmic reticulum of vertebrates and fungi. As a signature pattern for this family a 
conserved region in the C-terminal part of these enzymes was selected, this region is rich in histidine residues and in 
aromatic residues. Family 2 is composed of: - Plants stearoyl-acyl-carrier-protein desaturase (EC 1.14.99.6 ) [2J, these 
enzymes catalyze the introduction of a double bond at the delta(9) position of steraoyl-ACP to produce oleoyi-ACP 
This enzyme is responsible for the conversion of saturated fatty acids to unsaturated fatty acids in the synthesis of 
vegetable oils. - Cyanobacteria desA [3] an enzyme that can introduce a second cis double bond at the delta(12) 
position of fatty acid bound to membranes glycerolipids. DesA is involved in chilling tolerance; the phase transition 
temperature of lipids of cellular membranes being dependent on the degree of unsaturation of fatty acids of the mem- 
brane lipids. As a signature pattern for this family a conserved region in the C-terminal part of these enzymes was 
selected. 

[0484] Consensus pattern: G-E-x-[FY]-H-N-[FY]-H-H-x-F-P-x-D-Y- 

Consensus pattern: [ST]-[SA]-x(3)-[QRHLI]-x(5,6)-D-Y-x(2)-[LIVMFYW]-{LIVM]-[DE]- 

[ 1) Kaestner K.H., Ntambi J.M., Kelly TJ. Jr., Lane M.D. J. Biol. Chem. 264:14755-14761(1989). 
[ 2] Shanklin J., Somerville CR. Proc. Natl. Acad. Sci. U.S.A 88:2510-2514(1991). 
[ 3J Wada H., Gombos Z., Murata N. Nature 347:200-203(1990). 

[0485] 150. Dihydroorotase signatures 

Dihydroorotase (EC 3.5.2.3 ) (DHOase) catalyzes the third step in the de novo biosynthesis of pyrimidine, the conversion 
of ureidosuccinic acid (N-carbamoyl-L-aspartate) intodihydrcorotate. Dihydroorotase binds a zinc ion which is required 
for its catalytic activity [1J. In bacteria, DHOase is a dimer of identical chains of about 400 amino-acid residues (gene 
pyrC). In higher eukaryotes, DHOase is part of a large multi-functional protein known as 'rudimentary 1 in Drosophila 
and CAD in mammals and which catalyzes the first three steps of pyrimidine biosynthesis [2]. The DHOase domain is 
located in the central part of this polyprotein. In yeasts, DHOase is encoded by a monof unctional protein (gene URA4). 
However, a defective DHOase domain [3] is found in a multifunctional protein (gene URA2)that catalyzes the first two 
steps of pyrimidine biosynthesis. The comparison of DHOase sequences from various sources shows [4] that there 
are two highly conserved regions. The first located in the N-terminal extremity contains two histidine residues suggested 
[3] to be involved in binding the zinc ion. The second is found in the C-terminal part. Signature patterns for both regions 
have been developed. Allantoinase (EC 3.5.2.5 ) is the enzyme that hydrolyzes allantoin intoallantoate. In yeast (gene 
DAL1 ) [5], it is the first enzyme in the allanto indegradation pathway; in amphibians [6 J and fish it catalyzes the second 
step in the degradation of uric acid. The sequence of allantoinase is evolutionary related to that of DHOases. 
[0486] Consensus pattern: D-lLIVMFYWSAP)-H-[LIVA]-H-[LIVFHRN]-x-[PGANF] [The two H's are probable zinc 
ligands]- 

Consensus pattern: [GA]-[ST]-D-x-A-P-H-x(4)-K- 

[ 1] Brown DC, Collins K.D. J. Biol. Chem. 266:1597-1604(1991). 
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[0487] 151. dnaJ domains signatures and profile 

nch ^ (W domain of .bout 30 ZdwT^ZwSl. ™^ *" 70 "**• a 



•— +H -) + + 

^, + N-terminal|| 

Gly-R 1 1 CXXCXGXG | C-terminal | < +_ + , , 



[0489J It has been shown [2] that the 'J' domain as well as ttm r rr> H rt m~„ . , 

eukaryotic proteins which are listed below. ° a,S ° f0Und in other P^ryotic and 

a) Proteins containing both a \J' and a 'CRR' domain: 

" n r ? in 2S h 5/YDJ1 SeemS t0 be inVO,Ved in nirtochondrial protein import 

- Yeast protein SCJ1 , tnvolved in protein sorting 

- Yeast protein XDJ1. 

- Plants dnaJ homologs (from leek and cucumber) 

- Human HDJ2, a dnaJ homotog of unknown function 

- Yeast hypothetical protein YNL077w 

a) Proteins containing a 'J'domain without a 'CRR' domain: 

- Yeast protein CAJ1. 

- Yeast hypothetical protein YFR041a 

- Yeast hypothetical protein YIR004W. 

- Yeast hypothetical protein YJL1 62c. 

- Plasmodium falciparum ring-infected erythrocyte surface antiqenfRESA^ RPqA^ fl f • 
SiSSS^ membrane Ske,et ° n «™» whosefuncfon ,s not known, 

Human HSJ1, a neuronal protein. 

- Drosophite cysterne-string protein (csp). 



S ? T ?^ LanaB ' T ' D ° U9las MG Trends Bioche ™ Sci. 19:176-181(1994) 
2 Bo* P Sander C. Va.encia A., Bukau B. Trends Bbchem. Sci 17 129 , 29(1992) 
[3] Ueg.cn, C, Kaneda M., Yamada H., Mizuno T. Proc. Na.l. Acad. Sc! L I.SaS^. 1058, 1994). 
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[0492] 152. 
[0493] 153. Dwarfin 

[0494] This family known as the dwarfins also includes the drosophila protein MAD. The N-terminus of MAD can 
bindtoDNA[2]. 

[0495] [1] Yingling JM, Das P. Savage C. Zhang M, Padgett RW ( Wang XF, Proc Natl Acad Sci U S A 1996;93: 
8940-8944. [2] Kim J, Johnson K. Chen HJ, Carroll S, Laughon A, Nature 1997;388:304-308. 
[0496] 154. Dynein light chain type 1 signature 

Dynein is a murtisubunit microtubule-dependent motor enzyme that acts as the force generating protein of eukaryotic 
cilia and flagella. The cytoplasmic isoform of dynein acts as a motor for the intracellular retrograde motility of vesicles 
and organelles along microtubules. Dynein is composed of a number of ATP-binding large subunits, intermediate size 
subunits and small subunits. Among the small subunits, there is a family [1 ,2] of highly conserved proteins which consist 
of: - Chlamydomonas reinhardtii flagellar outer arm dynein 8 Kd and 11 Kd light chains. - Higher eukaryotes cytoplasmic 
dynein light chain 1. -Yeast cytoplasmic dynein light chain 1 (gene DYN2 or SLC1). - Caenorhabditis elegans hypothet- 
ical dynein light chains M1 8.2 and T26A5.9.These proteins are have from 89 to 1 20 amino acids. As a signature pattern, 
A highly conserved region was selected. 
Consensus pattern: H-x-l-x-G-[KR]-x-F-[GA]-S-x-V-[STHHY]-E - 

[ 1] King S.M., Patel-King R.S. J. Biol. Chem. 270:11445-11452(1995). 

[ 2] Dick T, Ray K., Salz H.K., Chia W. Mol. Cell. Biol. 16:1966-1977(1996). 

[0497] 155. dUTPase 

[0498] dUTPase hydrotyzes dUTP to dUMP and pyrophosphate. 

[0499] [1J Cedergren-Zeppezauer ES, Larsson G, Nyman PO, Dauter Z, Wilson KS, Nature 1992;355:740-743. [2] 
Mol CD, Harris JM, Mcintosh EM, Tainer JA, Structure 1996;4:1077-1092. 

[0500] 156. (dCMP cyt deam) Cytidine and deoxycytidylate deaminases zinc-binding region signature 
Cytidine deaminase (EC 3.5.4.5 ) (cytidine aminohydrolase) catalyzes the hydrolysis of cytidine into uridine and am- 
monia while deoxycytidylatedeaminase (EC 3.5.4.12) (dCMP deaminase) hydrolyzes dCMP intodUMP. Both enzymes 
are known to bind zinc and to require it for their catalytic activity [1 ,2]. These two enzymes do not share any sequence 
similarity with the exception of a region that contains three conserved histidine and cysteine residues which are thought 
to be involved in the binding of the catalytic zincion. Such a region is also found in other proteins [3,4]: - Yeast cytosine 
deaminase (EC 3.5.4.1 ) (gene FCY1) which transforms cytosine into uracil. - Mammalian apolipoprotein B mRNA 
editing protein, responsible for the postranscriptional editing of a CAA codon into a UAA (stop) codon in the APOB 
mRNA. - Riboflavin biosynthesis protein ribG, which converts 2,5-diamino-6- (ribosylamino)-4(3H)*pyrimidinone 5- 
phosphate into 5-amino-6-(ribosylamino)-2,4(1H,3H)-pyrimidinedione S'-phosphate. - Bacillus cereus blasticidin-S 
deaminase (EC 3.5.4.23 ), which catalyzes the deamination of the cytosine moiety of the antibiotics blasticidin S, cy- 
tomycin and acetylblasticidin S. - Bacillus subtilis protein comEB. This protein is required for the binding and uptake 
of transforming DNA. - Bacillus subtilis hypothetical protein yaaJ. - Escherichia coli hypothetical protein yfhC. - Yeast 
hypothetical protein YJL035c. A signature pattern for this zinc-binding region was derived. 

[0501] Consensus pattern: [CH]-{AG V]-E-x(2)-[LI VMFGAT]-[U VM]-x(1 7,33)-P-C-x(2,8)-C-x(3)-[U VM] [The C's and 
H are zinc ligands 

[ 1] Yang C, Carlow D., Wolfenden R. ( Short S.A. Biochemistry 31:4168-4174(1992). 

1 2) Moore J.T., Silversmith R.E.. Maley G.F., Maley F. J. Biol. Chem. 268:2288-2291 (1993). 

[ 3] Reizer J., Buskirk S., Bairoch A., Reizer A., Saier M.H. Jr. Protein Sci. 3:853-856(1994). 

[ 4] Bhattacharya S. ( Navaratnam N. ( Morrison J.R., Scott J. ( Taylow W.R. Trends Biochem. Sci 19:105-106(1994). 

[0502] 157. Dehydrins signatures 

A number of proteins are produced by plants that experience water-stress. Water-stress takes place when the water 
available to a plant falls below a critical level. The plant hormone abscisic acid (ABA) appears to modulate the response 
of plant to water-stress. Proteins that are expressed during water-stress are called dehydrins [1,2] or LEA group 2 
proteins [3]. The proteins that belong to this family are listed below. 

- Arabidopsis thaliana XERO 1 , XERO 2 (LTI30), RAB18, ERD10 (LTI45) ERD14 and COR47. 

- Barley dehydrins B8, B9, B17, and B18. 
Cotton LEA protein D-11 . 

Craterostigma plantagineum dessication-related proteins A and B. 

- Maize dehydrin M3 (RAB-1 7). 

- Pea dehydrins DHN1 , DHN2, and DHN3. 



AO 
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Radish LEA protein. 

Rice proteins RAB 16B, 16C, 16D, RAB21, and RAB25 
Tomato TAS1 4. 



o 



5 



■ Wheat dehydrin RAB 15 and cold-shock protein cor4 10> cs66 and cs120. 

been found in all known dehydrins so ta^TSE? I d hi" "* u * Such a re 9 io " h « 

presence of two copies of al/sine-rich octane KSS"? i ^ ' 0a,Ure fe the 

that foifows the po^-senne region and the sS w JZ lfZ ? *" ^ ° f Char 9 ed residues 

both regions were derived ^ Y d at ,he c -' erm '"a' extremity. Signature patterns for 

[0504] Consensus pattern: S(5)-[DE]-x-[DE]-G-x(1,2)-G-x(0 1 H KR](4 
Consensus pattern: [KRHLIM]-K-[DE>K-[UM)-P-G- t ' 1 ' |KH " 4 

S STJ-'- AA " Chandter PM ' P,ant Mo1 BW. 13:95-108(1989) 
2 Robertson M., Chandler P.M. Plant Mol. Biol. 19:1031-1044(1992? 
[3] Dure L III, Crouch M Harada J Hot w r> IV 1 ~ )- 

12:475-486(1989). " "° ^ °" MUndy J - Qua,ra "° R - T.. Sung ZR. Plant Bio| . 

EJf ! de °. R) BaC,eria, re9U,a,0ry P '° ,eins ' deoR signature 

accR, the Agrobacterium tumefaciens plasS oS?^!^ nf SUWam " ,eS 9 roups 108 ™*>™9 proteins[1 .2]: - 
the Escherichia coli aga operon 9^^^'^^^ T^*""?* 

the Escherichia coli L-fucose operon activator - J^e E^S f ^"^^^ re P'«ssor. - f ucR. 

Escherichiaco.ig^cero.-3-phc^phateregulonrepre^; o^Z^^l^^-^ repressor - " ** 

-io'R.'romBaci..ussubti.is.-lacR.^^^ 

subtilis transcription regulator of the sigKgeT^R^^ 

colit^ 

- VJhJ. an Escherichia coli hypothetical protein The ^Z**IxJ™7^T Eschenchla «* hypothetical protein, 
the N-termina.part of the sequence. The pSe^^^J^^ 9 ^ °' 9,638 protei "s «• Seated in 
motif and ends one residue after it th8Se pr ° te,ns s,ar1s fourtee " residues before the HTH 

[0506, C ~P-em:^^ 

f 2 ! SSSK^oS^^ ^ A - « US - ^7(1992). 

[0507] 159. dsrm 
Double-stranded RNA binding motif 

SiSSS:^ 

localize of a, least five different mRNAs in the ^££SE^f£^ ■* ** h iS inVOlved in 

«n humans, which is part of the cellular response ^0^^ * by protein kinase 

[0509] Number of members: 116 
[0510] 160. Dynamin family signature 

prote.ns: - Drosophila shibire protein (gene shi) 13TS™ e " structurally related to the following 

dynamin. I, seems to provide! JrZ^LT^n^/ T' ,h ° . DrOSOphila of mammalian 

VPS1 (or SPD15) [4], a protein which cou,d ZS^^^^S^ h VaCU °' ar ^ Pr ° ,8i " 
[5]. which is required for mitochondrial genome ma JemnS , r^**"* " Yeast pro,ein M G*<» 

-Interferon induced Mx proteins [6,7J il^n^^ J S &n ° NM1 ' "*"* 18 inVolved in endocytosis 
Most o, these proteins are known o cSS 

malian cell in cuKure. The three motifs iZ7^J P T^ZT a ^ rt "**»*»" °" ,rans,ec,ed "»»»■ 
promins.Thesigr«^^ 

of the ATP/GTP-binding motif TV (P-loop) (see <£S^T ^ 3 "'Conserved region downstream 
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[0511] Consensus pattern: L-P-[RK}-G-[STNHGNHLIVM]-V-T-R- 

[ 1) Vallee R.B., Shpetner H.S. Annu. Rev. Biochem. 59:909-932(1990). 

[ 2) Obar R.A., Collins C.A., Hammarback J. A., Shpetner tf.S., Vallee R.B. Nature 347:256-261(1990). 
( 3] van der Bliek A., Meyerowitz EM Nature 351:411-414(1991). 

[ 4] Rqthman J.K, Raymond C.K., Gilbert T, CWara P.J., Stevens T.H. Cell 61:1063-1074(1990) . 

[ 5] Jones B.A., Fangman W.L Genes Dev. 6:380-389(1992). 

[ 6] Arnheiter K, Meier E. New Biol. 2:851-857(1990). 

[ 7] Staeheli P., Pitossi F. t Pavlovic J. Trends Cell Biol. 3:268-272(1993). 

[051 2] 161. (dynamtn_2) Dynamin central region 

[0513] This region lies between the GTPase domain, see dynamin , and the pleckstrin homology (PH) domain. 
[051 4] 1 62. E 1 -E2 ATPases phosphorylation site 

E1-E2 ATPases (also known as P-type) are cation transport ATPases which form an aspartyl phosphate intermediate 
in the course of ATP hydrolysis. ATPases which belong to this family are listed below [1 ,2,3). - Fungal and plant plasma 
membrane (H+) ATPases [reviewed in 4]. - Vertebrate (Na+, K+) ATPases (sodium pump) [reviewed in 5,6]. -Gastric 
(K+, H+) ATPases (proton pump). - Calcium (Ca++) ATPases (calcium pump) from the sarcoplasmic reticulum (SR), 
the endoplasmic reticulum (ER) and the plasma membrane. - Copper (Cu++) ATPases (copper pump) which are in- 
volved in two human genetic disorders: Menkes syndrome and Wilson disease [7]. - Bacterial potassium (K+) ATPases. 
- Bacterial cadmium efflux (Cd++) ATPases [reviewed in 8). - Bacterial magnesium (Mg++) ATPases. - A probable 
cation ATPase from Leishmania. - fixl, a probable cation ATPase from Rhizobium meliloti, involved in nitrogen fixation. 
The region around the phosphorylated aspartate residue is perfectly conserved in all these ATPases and can be used 
as a signature pattern. 

[0515] Consensus pattern: D-K-T-G-T-[Lll-[TI] [D is phosphorylatedl 

[ 1J Green N.M., McLennan D.K Biochem. Soc. Trans. 17:819-822(1989). 

[ 2] Green N.M. Biochem. Soc. Trans. 17:970-972(1989). 

[ 3] Fagan M.J.; Saier M.H. Jr. J. Mol. Evol. 38:57-99(1994). 

[ 4) Serrano R. Biochim. Biophys. Acta 947:1-28(1988). 

[ 5] Fambrough D.M. Trends Neurosci. 11:325-328(1988). 

[ 6] Sweadner KJ. Biochim. Biophys. Acta 988:185-220(1989). 

[ 7] Bull PC, Cox D.W. Trends Genet. 10:246-251(1994). 

[ 8] Silver S., Nucifora G., Chu L, Misra T.K. Trends Biochem. Sci. 14:76-80(1989). 

[0516] 163. E1_N 

E1 Protein, N terminal domain 

Number of members: 90 

[051 7] 1 64. (E 1 _dehydrog) Dehydrogenase E 1 component 

[0518] This family uses thiamine pyrophosphate as a cofactor. This family includes pyruvate dehydrogenase, 2-ox- 
oglutarate dehydrogenase and 2-oxoisovalerate dehydrogenase. 
[0519] 165. (ECH) Enoyl-CoA hydratase/isomerase signature 

Enoyl-CoA hydratase (EC 4.2.1.17 ) (ECH) [1] and 3-2trans-enoyl-CoA isomerase(EC 5.3.3.8 ) (ECI) [2] are two en- 
zymes involved in fatty acid metabolism. ECH catalyzes the hydratation of 2-trans-enoyl-CoA into 3-hydroxyacyl-CoA 
and ECI shifts the 3- double bond of the intermediates of unsaturated fatty acid oxidation to the 2-trans position. Most 
eukaryotic cells have two fatty-acid beta-oxidation systems, one located in mitochondria and the other in peroxisomes. 
In mitochondria, ECH and ECI are separate yet structurally related monof unctional enzymes. Peroxisomes contain a 
trifunctbnal enzyme [3] consisting of an N-terminal domain that bears both ECH and ECI activity, and a C-terminal 
domain responsible tor 3-hydroxyacyl-CoA dehydrogenase (HCDH) activity. In Escherichia coli (gene fadB) and Pseu- 
domonas tragi (gene faoA), ECH and ECI are also part of a multifunctional enzyme which contains both a HCDH and 
a3-hydroxybutyryl-CoA epimerase domain [4]. A number of other proteins have been found to be evolutionary related 
to the ECH/ECI enzymes or domains: - 3-hydroxbutyryl-coa dehydratase (EC 4.2.1.55 ) (crotonase), a bacterial enzyme 
involved in the butyrate/butanol-producing pathway. - Naphthoate synthase (EC 4.1.3,36) (DHNA synthetase) (gene 
menB) [5], a bacterial enzyme involved in the biosynthesis of menaquinone (vitamin K2). DHNA synthetase converts 
O-succinyl-benzoyl-CoA (OSB-CoA) to 1 ,4-dihydroxy- 2-naphthoic acid (DHNA). - 4-chbrobenzoate dehalogenase 
(EC 3.8.1.6 ) (6], a Pseudomonas enzyme which catalyzes the conversion of 4-chlorobenzoate-CoA to 4-hydroxyben- 
zoate-CoA. - A Rhodobacter capsulatus protein of unknown function (ORF257) [7J. - Bacillus subtilis putative polyketide 
biosynthesis proteins pksH and pksl. - Escherichia coli carnitine racemase (gene caiD) [8J. - Escherichia coli hypothet- 
ical protein ygtG. - Yeast hypothetical protein YDR036c.As a signature pattern tor these enzymes, a conserved region 
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richin glycine and hydrophobic residues was selected 

3 Pto. Ml. Hitaon J.K. J. Biol. Chem. 265:2441-2449(1990) 

LtlC^K^™;^.^''- "• <*- p..H„ 

[ 7) Beckman D.L, Kranz R.6. Gene 107:171-172(1991) 

1 8] Eteh.er K.. Bourns F., Buchet K. K.eber H,R. Mandrarrt-Beitheiot M.-A. Mo,. Microbe. , 3:775-786(1994). 
S J , 166 ; (EF1BD) El0n9a,i0n ,aC,0r 1 "°fefcetaVde«a *ah signatures 

chain thatprobablyplaysarote in anc^Vheco^ 
chain, The beta and deKa chains are 

alpha chain for GTP [2J. The beta and tetoctete anh^l^J T * ° f GDP ^ to 

part seems important for me * 23 ,0 31 Kd Their C «nal 

interaction with the gamma chain. Two signature £^J5£ tl^ TT '* P ' 0t> ^ involved in th ° 

spends toan acidic region in the centra. sS r ? "T™ WWB ^ ^ c °"*' 

[0522] Consensus pattern: [D EH DEG H DeSTl^D L F G * ^ Pr ° ,einS 
Consensus pattern: [IVJ-Q-S-x-D-[LIVM>x-A-{FWM]-[NQhK-[LIVMJ. 

!2 * ^KTF:^^^^^- «* 15:420.24(1990). 
1050:241-247(1990). 68 TimmersC J - ^™GMC., MoellerW. Bioch™. Biophys. Acta 

s ;s- ! E FG G c ^ ain) E,on , 9ation ,actor 1 9amma - 

mc,2 4fr (EFG - C > Elongat,on factor G C-terminus 

IU5Z5] ' his family is always found associated with GTP fpti i Thie»= k • , 

To.GTP complex op to the GTP h^ota** JS^L^^sT^ '° ™ AEF - 
prol.in NoiynMlo oaohinoo, and is (rawed ? Z Z™T 1 1 f " ,ls ° ' ""W™ « »• chlooplaa 

[ 1] Bubunenko ^M.G.. Kireeva M.L.. Gudkov A.T. Biochimie 74:419-425(1992) 

2 v u T M - Ze,SChe K P ' ant Mo1 Bio1 - 23:67-76(1993) 
I 3] X,n H., Woriax V.L, Burkhar, W.A., SpremuHi LL. J. Bio.. Cham 



S mI1!^Z^ 25L ? ^4/gp25typ24 fami, 
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[0533] Paccaud JP t Thomas DY, Bergeron JJ, Nilsson T, J Cell Biol 1998;140:751-765. 

172. ENV_polyprotein 

ENV polyprotein (coat poryprotem) 

Number of members: 224 

[0534] 173. (ERG4_ERG24) Ergosterol biosynthesis ERG4/ERG24 family signatures 

Two fungal enzymes involved in ergosterol biosynthesis and which act by reducing double bonds in precursors of 
ergosterol have been shown to be evolutionary related [1]. These are C-14 sterol reductase (gene ERG24 in budding 
yeast and erg3 in Neurospora Crassa) and C-24(28) sterol reductase (gene ERG4 in budding yeast and sts1 in fission 
yeast). Their sequences are also highly related to that of chicken lamin B receptor, which is thought to anchor the 
lamina to the inner nuclear membrane. These proteins are highly hydrophobic and seem to contain seven or eight 
transmembrane regions. As signature patterns, two conserved regions were selected. The first one is apparently lo- 
cated in a loop between the fourth and fifth transmembrane regions and the second is in the C-terminal section. 
[0535] Consensus pattern: G-x(2)-[LIVM]-[YH]-D-x-[FYWJ-x-G-x(2)-L-N-P-R- 
Consensus pattern: [LiVM](2)-H-R-x(2)-R-D-x(3)-C-x(2)-K-Y-G- 

[ 1] Lai M.K, Bard M., Pierson CA, Alexander J.F., Goebl M., Carter G.T., Kirsch D.R. Gene 140:41-49(1994). 
[0536] 174. (ERM) Ezrin/radixinymoesin family 

[0537] This family of proteins contain a band 4. 1 domain (Band 41V at their amino terminus. This family represents 
the rest of these proteins. 

[0538] [1] Yonemura S, Hirao M, Doi Y, Takahashi N, Kondo T, Tsukita S, J Cell Biol 1998;140:885-895. 
[0539] 175. ER lumen protein retaining receptor signatures 

Proteins that reside in the lumen of the endoplasmic reticulum (ER) contain aC-terminal tetrapeptide (generally K-D- 
E-L or H-D-E-L) that serves as a signal for their retrieval (retrograde transport) from subsequent compartments of the 
secretory pathway. The signal is recognized by a receptor molecule that is believed to cycle between the cis side of 
the Golgi apparatus and the ER [1].This protein is known as the ER lumen protein retaining receptor or also as the 
'KDEL receptor 1 . It has been characterized in a variety of species, including fungi (gene ERD2), plants, Plasmodium, 
Drosophila and mammals. In mammals two highly related forms of the receptor are known. Structurally, the receptor 
is a protein of about 220 residues that seems to contain seven transmembrane regions [2]. The N4erminal part (3 
residues) is oriented toward the lumen while the C-terminal tail (about 12 residues) is cytoplasmic. There are three 
lumenal and three cytoplasmic loops. Two signature patterns for these receptors were developed. The first pattern 
corresponds to the C-terminal half of the first cytoplasmic loop as well as most of the second transmembrane domain. 
The second pattern is a perfectly conserved decapeptide that corresponds to the central part of the fifth transmembrane 
domain. 

[0540] Consensus pattern: G-l-S-x-[KR]-x-Q-x-L-[FY]-x-[LIV](2)-F-x(2)-R-Y- 
Consensus pattern: L-E-[SA]-V-A-I-[LM]-P-Q-L- 

[ 1] Pelham H.R.B. Curr. Opin. Cell Biol. 3:585-591(1991). 

[ 2] Townsley F.M., Wilson D.W., Pelham H.R.B. EMBO J. 12:2821-2829(1993). 

[0541] 176. (ETF_beta) Electron transfer flavoprotein beta-subunit signature 

The electron transfer flavoprotein (ETF) [1 ,2] serves as a specific electron acceptor for various mitochondrial dehydro- 
genases. ETF transfers electrons to the main respiratory chain via ETF-ubiquinone oxidoreductase. ETF is an het- 
erodimer that consist of an alpha and a beta subunit and which bind one molecule of FAD per dimer. A similar system 
also exists in some bacteria. The beta subunit of ETF is a protein of about 28 Kd which is structurally related to the 
bacterial nitrogen fixation protein f ixA which could play a role in a redox process and feed electrons to lerredoxin. Other 
related proteins are: - Escherichia coli hypothetical protein ydiQ. - Escherichia coli hypothetical protein ygcR.As a 
signature pattern for these proteins, a conserved region which is located in the central section was selected. 
[0542] Consensus pattern: [I VA]-x-|KR]-x(2)-[DE]- [GD]-[GDE]-x(1 ,2)-[EQ)-x-[LI VJ- x(4)-P-x-[U VM](2)-[TAC]- 

[ 1] Finocchiaro G., Ikeda Y, Ito M., Tanaka K. Prog. Clin. Biol. Res. 321:637-652(1990). 
[ 2] Tsai M.H., Saier M.H. Jr. Res. Microbiol. 146:397-404(1995). 

[0543] 177. Endonuclease III signatures 

Escherichia coli endonuclease III (EC 4.2.99.18 ) (gene nth) [1] is a DNA repair enzyme that acts both as a DNA N- 
glycosylase, removing oxidized pyrimidines from DNA, and as an apurinic/apyrimidinic (AP) endonuclease, introducing 
a single-strand nick at the site from which the damaged base was removed. Endonuclease III is an iron-sulfur protein 
that binds a single 4Fe-4Scluster. The 4Fe-4S cluster does not seem to be important for catalytic activity, but is probably 
involved in the proper positioning of the enzyme along the DNA strand [2].Endonuclease III is evolutionary related to 
the following proteins: - Fission yeast endonuclease III homolog (gene nthl) [3]. - Escherichia coli and related protein 
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•^-ORFIOhplasmidpFVIofthememMphfcaSS^ 
tionmethylasem.MthTI. which is encoded by thMasmM o^ 
resulting in G-T mismatches. This protein coulXr^^^^ 

Fission yeast hypothetical protein SpAC26A3 02 ? „" S< h yP° the,ica ' P*** YAL015c. - 

coccus jannaschii hyr^thetL. proton S? 3 ^ 

a 1 /amino acid region at the C-terminal end of en^nuclSe IH A ? S,e,neS Which are a » ,oca,ed j " 
of mutY and in the C-terminus of ORFIOand of the^c^!, L ^ ! B ^ preSent h ,ho cen,ra ' action 
no, exist in YAL015c. Two signature i^Vl^S^^TT^ ^ **** C ' US,er re 9 ion *" 
the iron-sulfur binding domain me io^JS^^^.^S? *?* 1°^^ ,0 "» Core 01 
enzymes. "*sponas to tne best conserved region in the catalytic core of these 

[0544] Consensus pattern: C-x(3HKRSVP-fKFWGLl-C-xi2> C w^uo rrh« . ~ 

Consensus pattern: [GSTJ-x^UVMR-P-wswlix^wi J? S Sof c PJ? °" r C 8 m 4F<MS "fl"** 

[UVMFYWUGANKjl 1 ' ' HL '™^" X(2 - 3HU H^^ 

, ^aaen n.i.i_, ae vos W.M. Nucleic Acids Res. 20:6501-6507(1992). 

S^-^SZT" 38 3 71,9 h '-^ - "uceotide-sugar substrates 

^ He9eman AD ' WeSe "^ G - <**>™ K'V PA. Ho«en HM, B^chem.try 1997 36- 
[0548] 179. Exonuclease 

[055?] Sea^- De ° ,SCherMP - N -.eic Acids Res 1993.21:2521-2522. 
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[0554] N umber of members: 29 

[0555] 181. (elF-1 A) Eukaryotic initiation factor 1A signature 
Eukarvotic translation initio;™ * * 7 to »ynaiure 



c- . . 1 ' wru31 yji't- iniuaiion factor 1A siqnature 
fcukaryotic translation initiation factor 1A telF-1 A\ m nZ~^w. ^ 

for maxima, rate of protein biosynth's s K^'J^^T^ " ' ^ ** ^ to be ^ ed 
the initiator Met-tRNA to 40S rirLoma. su^^a^J^^ T S ( Ubuni,s and stabi.izes.he binding of 
also seem to possess a elF-lA homoiog. aT^J^^T^T^I ° f about 15 to 17 ■«• Archaebacteria 
proteins was selected. stature pattern, a conserved reg.on in the central section of these 

53 [ C lTwei S C US L Pa !2 " [ * Mh M X t X - [GSHKRHW4HCL ^ D ^ x-G 

^theonryeukaryoticproteintocc^ 

additon of a buty,amino group (from sperS)^^ 

[055 9) Consensus pattern: P T>G-K-H-G^^^^ 

[ a M H ,- WOW EC - F °' k J E Bio,actore 4:95-104(1993) 

I : |J Scbmer ,. Schwe,ber 9 er KG.. Smit-McBrtfe Z. Kang H.A., Hershey ,W.B. Mo,. Ce„. «* 11:310Mm 
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[0560] 1B3. (efhand) S-100/lCaBP type calcium binding protein signature 

S-100 are small dimeric acidic calcium and zinc-binding proteins [1] abundant in the brain. They have two different 
types of calcium-binding sites: a low affinity one with a special structure and a 'normal' EF-hand type high affinity site. 
The vitamin-D dependent intestinal calcium-binding proteins (ICaBP or calbindin 9 Kd) also belong to this family of 
proteins, but it does not form dinners. In the past years the sequences of many new members of this family have been 
determined (for reviews see [2,3,4]); in most cases the function of these proteins is not yet known, although it is be- 
coming clearthat they are involved in cell growth and differentiation, cell cycle regulation and metabolic control. These 
proteins are: - 

Calcyclin (Prolactin receptor associated protein (PRA); clatropin; 2a9; 5B10; S100A6). - Calpactin I light chain (p10; 
pll; 42c; S100A10). - Calgranulin A (cystic fibrosis antigen (CFAg); MIF related protein 8 (MRP- 8); p8; S100A8). - 
Calgranulin B (MIF related protein 14 (MRP-14); p14; S100A9). - Calgranulin C. - Calgizzarin (S100C). - Placental 
calcium-binding protein (CAPL) (18a2; peL98; 42a; p9K; MTS1; metastatin; S100A4). - Protein S-100D (S100A5). - 
Protein S-100E (S100A3). - Protein S-100L (CAN 19; S100A2). - Placental protein S-100P (S100E). - Psoriasin 
(S100A7). - Chemotactic cytokine CP-10 [5]. - Protein MRP-126 [6). - Trichohyalin [7J. This is a large intermediate 
filament-associated protein that associates with keratin intermediate filaments (KIF); it contains a S- 100 type domain 
in its N-terminal extremity. A number of these proteins are known to bind calcium while others are not (p10for example). 
Our EF-hand detecting pattern will fail to pick those proteins which have tost their calcium-binding properties. A pattern 
was developed which unambiguously picks up proteins belonging to this family. This pattern spans the region of the 
EF-hand high affinity site but makes no assumptions on the calcium-binding properties of this site. 
[0561] Consensus pattern: [LIVMFYW](2)-x(2)-[LK]-D-x(3)-[DN]-x(3)-[DNSG]-[FY}-x- [ES]-[FYVC]-x(2)-[LIVMFSJ- 
[LIVMF] 

[ 1] Baudier J. (In) Calcium and Calcium Binding proteins, Gerday C, Bollis L, Giller R., Eds., pp102-113, Springer 
Verlag, Berlin, (1988). 

[ 2] Moncrief N.D., Kretsinger R.H., Goodman M. J. Mol. Evol. 30:522-562(1990). 
[ 3] Kligman D., Hilt D.C. Trends Biochem. Sci. 13:437-443(1988). 

[ 4] Schaefer B.W., Wicki R., Engelkamp D., Mattel M.-G., Heizmann C.W. Genomics 25:638-643(1995). 
[ 5] Lackmann M., Cornish C.J., Simpson R.J., Moritz R.L, Geczy C.L. J. Biol. Chem. 267:7499-7504(1992). 
[ 6) Nakano T, Graf T. Oncogene 7:527-534(1992). 

[ 7] Lee S.-C, Kim L-G., Marekov L.N., O'Keefe E.J., Parry D.A.D., Steinert P.M., J. Biol. Chem. 268:12164-12176 
(1993). 

EF-hand calcium-binding domain 

Many calcium-binding proteins belong to the same evolutionary family and share a type of calcium-binding domain 
known as the EF-hand [1 to 5). This type of domain consists of a twelve residue loop flanked on both side by a twelve 
residue alpha-helical domain. In an EF-hand loop the calcium ion is coordinated in a pentagonal bipyramidal configu- 
ration. The six residues involved in the binding are in positions 1 , 3, 5, 7, 9 and 12; these residues are denoted by X, 
Y, Z, -Y, -X and -Z. The invariant Glu or Asp at position 12 provides two oxygens for liganding Ca (bidentate Hgand). 
Listed below are the proteins which are known to contain EF-hand regions. For each type of protein the total number 
of EF-hand regions known or supposed to exist is indicated between parenthesis. This number does not include regions 
which clearly have lost their calcium-binding properties, or the atypical tow-affinity site (which spans thirteen residues) 
found in the S-100/ 
ICaBP family of proteins [6]. 

Aequorin and Renilla luciferin binding protein (LBP) (Ca=3). 
- Alpha actinin (Ca=2). - Calbindin (Ca=4). 

Calcineurin B subunit (protein phosphatase 2B regulatory subunit) (Ca=4). 
Calcium-binding protein from Streptomyces erythraeus (Ca=3?). 
Calcium-binding protein from Schistosoma mansoni (Ca=2?). 

Calcium-binding proteins TCBP-23 and TCBP-25 from Tetrahymena thermophila (Ca=4?). - Calcium-dependent 

protein kinases (CDPK) from plants (Ca=4). 

Calcium vector protein from amphoxius (Ca=2). 

Calcyphosin (thyroid protein p24) (Ca=4?). 

Calmodulin (Ca=4, except in yeast where Ca=3). 

Calpain small and large chains (Ca=2). - Calretinin (Ca=6). 

Calcyclin (prolactin receptor associated protein) (Ca=2). 

Cattractin (centrin) (Ca=2 or 4). 

Cell Division Control protein 31 (gene CDC31 ) from yeast (Ca=2?). 



oc 
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- Diacylglycerol kinase (EC 2.7.1.107) (DGK) (Ca=2) 

- FAD-dependent glycerol-3-phosphate dehydrogenase fEC 1 1 w cv , mm 

(Ca=2). yorogenase (EC 1.1.99.5) from mammals (Ca=l). - Florin (plastin) 

' tSZ S2tSS55r Ph ^ C - —1 (C-2) Il0] . - Intestina, ca,ciur, 

- MIF related proteins 8 (MRP-8 or CFAG) and 14 (MRP-14) (Ca=2) 

- Myosm regulatory light chains (Ca=l). -OncomodulinfCa^) 

- Osteonectin (basement membrane protein BM-401 fSPARr\ an H . 

(QR1. matrix grycoprote* SC1) (see the «m^2S^7S!5^ .T^ " ,0S,e0neCtin ' 

" 2S2? h ^ * S - 1 °° Pr0tein - a,pha — bete cnainstca^" 
Sarcoplasmic calcium-binding protein (SCPs)(Ca=2 to 3) 

Serine/threonine protein phosphatase rdoc IBC 3 1 q ia (mm ^ 

(Ca=2). - Spectrin alpha chain (Ca=2) ^ OSOPh,la (Ca=2) " Sorcin V19 hamster 

- Squidulin (optic lobe calcium-binding protein) from squid (Ca=4) 

- Troponins C; from skeletal muscle (Ca.4). from card,c muscle Ws). from arthropods and molluscs (Ca-2, 

^Sa™^^^ 

pattern was devefoped wh'fch takes No a^ ° , a '"?*" "™ ^ a ™ 

-oop as well as thefirst residue wh,h follows the 

■ sss; pat,ern: d - mdnsh,l ^ den ^ 

exist in this position in functional Ca-binding sites ' reS ' dUeS been show " *> 

- Not. the pattern wi, m some cases, m,s one o, the EF-hand regfcns in some prote.s wfth mu.ple EF-hand 

l^S^ Pr0,ei " Pr °«- ^5-490(1995), 2, Kretsinger R.H. Co.d Sprng Harbor Symp. 

[ 2 M^ Crief N o - Kretsi "9 er R H - G °«lrnan M. J. Mol. Evol 3a522-562f1990> 
SSTr. ^' N.D., Kretsinger R H. J. Mol. Evol. SSeS t 992) 

5 Hermann C.W., Hunziker W. Trends Biochem. Sci. IS 98-103(1991) 

6 Khgman D., Hilt D.C. Trends Biochem. Sci. 13 437-443(1988) 

8 HaTch? ?^ JameS MXQ - ReV " Bioch e rS.98(19e9) 
8] Ha»ech J., Sallantin J. Biochimie 67:555-560(19851 

I 9] Chauvaux S., Bequin P Aubert 1 p ah=» ir X . . 

(1990). 9 ■^^•^^•^^•^rM..Bm m hKB» 0 ^j.2BS:2B^ 

[10] Bairoch A.. Cox J.A. FEBS Lett 269:454-456(1990). 

[0562] 184. Fnolase signature 

Se^ n o, 2 - P hos P ho- D -g lyceral e ,0 phosphoe- 

is probabhy found in all organisms^. 2? T "H?*" ** *™ En0laSe 

zymes: alpha present in most tissues, beta in muscle? and mm™ , T h , three d ' fferem ,issu e-specific iso- 
of the major lens proteins in some fish, ^^T^ZJZ"^ ? ^ Tai ~»- one 

As a s,gnature pattern for enolase. the bes, conserved reoK vS se fJZ , M,onat y re,al «* to enolase. 
sequence.- erveo region was selected, it is located in the terminal third of the 

Mlf I J;°" SenSUS P a,,em: l L,V K3)-K-x-N-Q-l-G-|ST]-[LIV]-rSTl-rDEl-ISTAl 

1 2J Wistow G., Piattigorsky J. Science 236:1554-1556(1987) J(1989) 
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[0564] 1 85. (F-actin_cap_A) F-actin capping protein alpha subunit signatures 

The F-actin capping protein binds in a calcium-independent manner to the fast growing ends of actin filaments (barbed 
end) thereby blocking the exchange of subunits at these ends. Unlike gelsolin and severin this protein does not sever 
actin filaments. The F-actin capping protein is a heterodimer composed of two unrelated subunits: alpha and beta.The 
alpha subunit is a protein of about 268 to 286 amino acid residues whose sequence is well conserved in eukaryotic 
species [1J. As signature patterns two highly conserved regions in the C-terminal section of the alpha subunit were 
selected. 

[0565] Consensus pattern: V-H-[FY](2)-E-D-G-N-V 
Consensus pattern: F-K-[AE]-L-R-R-x-L-P- 

[0566] [ 1] Cooper J.A., Caldwell J.E., Gattermeir D.J., Torres M.A., Amatruda J.F., Casella J.F. Cell Motil. Cytoskel- 
eton 18:204-214(1991). 
[0567] 1 66. F-box domain 

[0568] 11] Bai C t Sen R Hofmann K, Ma L, Goebl M, Harper JW, Elledge SJ, Cell 1 996;86:263-274. [2] Skowyra D ( 
Craig KL. Tyers M, Elledge SJ, Harper JW, Ceil 1 997:91 :209-21 9. 
[0569] 187. F-protein 
Negative factor, (F Protein) or Nef. 

[0570] [1] Arold S ( Franken P, Strub M-P, Hon F, Benichou S. Benarous R, Dumas C; Medline: 98035457, The crystal 
structure of HIV-1 Nef protein bound to the Fyn kinase SH3 domain suggests a role for this complex in altered T cell 
receptor signalling Structure 1997;5:1361-1372. 

[0571] Nef protein accelerates virulent progression of AIDS by its interaction with cellular proteins involved in signal 
transduction and host cell activation. Nef has been shown to bind specifically to a subset of the Src kinase family 
[0572] Number of members: 1013 
[0573] 188. (FAD_binding_2) 

Fumarate reductase / succinate dehydrogenase FAD-binding site In bacteria two distinct, membrane-bound, enzyme 
complexes are responsible for the interconversion of fumarate and succinate (EC 1.3.99.1): fumarate reductase (Frd) 
is used in anaerobic growth, and succinate dehydrogenase (Sdh) is used in aerobic growth. Both complexes consist 
of two main components: a membrane-extrinsic component composed of a FAD-binding flavoprotein and an iron-sulfur 
protein; and an hydrophobic component composed of a membrane anchor protein and/or a cytochrome B. 
[0574] In eukaryotes mitochondrial succinate dehydrogenase (ubiquinone) (EC 1 .35.1 ) is an enzyme composed of 
two subunits: a FAD flavoprotein and and iron-sulfur protein. 

[0575] The flavoprotein subunit is a protein of about 60 to 70 Kd to which FAD is covalently bound to a histidine 
residue which is located in the N-terminal section of the protein [1 ]. The sequence around that histidine is well conserved 
in Frd and Sdh from various bacterial and eukaryotic species [2] and can be used as a signature pattern. 
[0576] Consensus pattemR-[ST]-H-[ST]-x(2)-A-x-G-G [H is the FAD binding site] Sequences known to belong to this 
class detected by the pattern ALL 

[ 1] Blaut M., Whittaker K., valdovinos A., Ackrell B.A., Gunsalus R.R, CecchiniG. J. Biol. Chem. 264:13599-13604 
(1989). 

[ 2] Birch-Machin M.A., Farnsworth L, Ackrell B.A., Cochran B., Jackson S., Bindoff L.A., Aitken A., Diamond A. 
G., Turnbull D.M. J. Biol. Chem. 267:11553-11558(1992). 

[0577] 189. Fatty acid desaturases signatures (FA_desaturase) 

Fatty acid desaturases (EC 1.14.99.-) are enzymes that catalyze the insertion of a double bond at the delta position 
of fatty acids. There seems to be two distinct families of fatty acid desaturases which do not seem to be evolutionary 
related. Family 1 is composed of: - Stearoyl-CoA desaturase (SCD) (EC 1.14.99.5 ) [1]. SCD isa key regulatory enzyme 
of unsaturated fatty acid biosynthesis. SCD introduces a cis double bond at the delta(9) position o1 fatty acyl-CoA's 
such as palmitoleoyl- and oleoyl-CoA. SCD is a membrane-bound enzyme that is thought to function as a part of a 
multienzyme complex in the endoplasmic reticulum of vertebrates and fungi. As a signature pattern for this family a 
conserved region in the C-terminal part of these enzymes was selected, this region is rich in histidine residues and in 
aromatic residues. Family 2 is composed of: - Plants stearoyl-acyl-carrier-protein desaturase (EC 1.14.99.6 1 [2], these 
enzymes catalyze the introduction of a double bond at the delta(9) position of steraoyl-ACP to produce oleoyl-ACP. 
This enzyme is responsible for the conversion of saturated fatty acids to unsaturated fatty acids in the synthesis of 
vegetable oils. - Cyanobacteria desA [3] an enzyme that can introduce a second cis double bond at the defta(12) 
position of fatty acid bound to membranes glycerolipids. DesA is involved in chilling tolerance; the phase transition 
temperature of lipids of cellular membranes being dependent on the degree of unsaturation of fatty acids of the mem- 
brane lipids. As a signature pattern for this family a conserved region in the C-terminal part of these enzymes was 
selected. 

[0578] Consensus pattern: G-E-x-{FY]-H-N-[FY]-H-H-x-F-P-x-D-Y- 
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Consensus pattern: fS-n-ISAJ-xOHQRJKL^S.ej-D-Y-xt^-fUVMFYWHLIVMUDEJ- 

1 3] Wada K, Gombos Z. Murata N. Nature 347:200-203(1990) 2514<1991 >- 
[0579] 1 90. Fructose-1 -6-bisphosphatase active site (FBPase) 

found in most on*ntanJS*J^^ EcT^Sw^ me,abo,ic "■""«*■ ™» 

chtoropbst and in photosynthetic bacteria £a 2S ■Te^Jt^?^** (2J ' S 30 en2yme found P*"' 
heptulose 7-phosphate. a s , ep m tne cJh! SS? sedoheptulose 1,7-bisphosphate to sedc- 

relatedtoFBPase. In mammalian FBPase ri^2LSSSL£?^! ? " * and s,ructura "y 

[3]. The region aroundthis residue is high J c^sT^ 
It must be noted that, in some bacterial FBpT^™ 
-™-^^^ 

1 1] Benkovic S.J., DeMaine M.M. Adv. Enzymol. 53:45-82(1 S82) 

19 V FGGY ' amily of cafohydrate kinases signatures * 

ZZim (9eneg. P K). -xj^^l^ SSSSTl 2^7® ^ " G ^inas7(EC 
enzymes are proteins of from 480 to 520 ^SiuJ££! ™ ^ (EC ^- 7153 J 0 ene ^.These 

conserved regionswere selected, one 1' To « his ,ami * « •*«»• two 

[0581] Consensus pattern" [MFYGS -x-fPsTS ™ ^iS^.? ° C ' termina, SeC,ioa 
Consensus pattern- [^"jKmp^ J? ™« 2 ^ ,VMFYW J- x - W -t UV MF]-x-[DENQTKR h [ENQHf 

pSi^^ 

protein folding by catalyzing the cis-trans s^ffifin T^' PP ' aS8 * a " en2yme ,hat Urates 
three different forms of FKBP are ^£ZZ£Z^££ iT^k^"* "* ° H ^ e P ,ides M" .east 
FK506 and rapamycin. - FKBP-1 3, which is membranlT^f * " . ^ ' ^ ' S Cy,OSO,ic ^ ' mhib *«* »V both 
25, which is preferential inhbitedtyT^ 

s.milari.ies[5.6,7J with the following pLeL - ^FuncSSp m™ I and show °*° n ™° 

p59). HB. is a protein which binds to hspi an^ rnmunophi.in (HBI) (also ca.led 

which seems to be functional. - The C-tenriinal paTo Z l^Zt , 5 " rtS N " ,emr,inal sec,ion " the °< 
with macrophage infection by an un^ZrmS^ 

domain followed by an histidlne-rich meS-SSo £Li T IT K C °" T, [8J ' 3 pro,eh with a N^^minal FKBP 
- Escherichia coli s.pA. - Bacteria. trigge^^^ ' 001 ,k,B < FKBP22 ) 

protein. - Chlamydia trachomatis 27 Kd membwtrJ^n h£T hy9rOSC °P us and chrysomallus FKSOS-binding 
PPiases from Haemophilus influenzae <2££j 1»k " me "'"9i«idfe strah C1 14 PPiase. - Probable 

fluorescensandPseu^ase ae - PseudomonaJ 

on a conserved region in the N-terminus of HW ZZT TT"™ deve ^ °"» * -ased 

the complete domain. l0Ca,ed ln ,he cen,ral sec,IOf »- The profile for FKBP spans 

! 3 2tt£X7Z2Z£ir~ tmm E Fx Na "" 9 

ssssitisss* ^ H • ^ ° ■ ^ j ■ pj • «* «h « 

1 5| Trandinh C.C.. n» G.M., Ssie, MH. Jt. FASE8 J. 6-3410-3420(1982). 
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[ 6J Gaiat A. Eur. J. Biochem. 216:689-707(1993). 

[ 71 Hacker J., Fischer G. MoL Microbiol. 10:445456(1993). 

[81 Wueifing C, Lomardero J., Pluecklhun A. J. Biol. Chem. 269:2895-2901(1994). 

[0586] 1 93. MAPEG family (aka: FLAP/GST2/LTC4S family signature) 
[0587] The following mammalian proteins are evolutionary related [1 ): 

- Leukotriene C4 synthase (EC 2.5. 1 .37) (gene LTC4S), an enzyme that catalyzes the production of LTC4 from LTA4. 

- Microsomal glutathione S-transferase II (EC 2.5.1.18) (GST-II) (gene GST2), an enzyme that can also produces 
LTC4 fron LTA4. 

- 5-lipoxygenase activating protein (gene FLAP), a protein that seems to be required for the activation of 5-lipoxy- 
genase. 

[0588] These are proteins of 1 50 to 1 60 residues that contain three transmembrane segments. As a signature pattern, 
a conserved region between the first and second transmembrane domains was selected. 
[0589] Consensus pattern: G-x(3}-F-E-R-V-[FY]-x-A-[NQ]-x-N-C 

[0590] [1J Jakobsson P-J., Mancini J. A., Ford-Hutchinson A.W. J. Biol. Chem. 271:22203-22210(1996). 
[0591] 1 94. FMN-dependent alpha-hydroxy acid dehydrogenases active site (FMN_dh) 

A number of oxidoreductases that act on alpha-hydroxy acids and which are FMN -containing flavoproteins have been 
shown [1,2,3] to be structurally related; these enzymes are: - Lactate dehydrogenase (EC 1.1.2.3 ). which consists of 
a dehydrogenase domain and a heme-binding domain called cytochrome b2 and which catalyzes the conversion of 
lactate into pyruvate. - Glycolate oxidase (EC 1.1.3.15 ) ((S)-2-hydroxy-acid oxidase), a peroxisomal enzyme that cat- 
alyzes the conversion of glycolate and oxygen to glyoxylate and hydrogen peroxide: - Long chain alpha-hydroxy acid 
oxidase from rat (EC 1.1.3.15 ). a peroxisomal enzyme. - Lactate 2-monooxygenase (EC 1.13.12.4 ) (lactate oxidase) 
from Mycobacterium smegmatis, which catalyzes the conversion of lactate and oxygen to acetate, carbon dioxide and 
water. - (S)-mandelate dehydrogenase from Pseudomonas putida (gene mdlB), which catalyzes the reduction of (S)- 
mandelate to benzoylformate. The first step in the reaction mechanism of these enzymes is the abstraction of the 
proton from the alpha-carbon of the substrate producing a carbanion which can subsequently attach to the N5 atom 
of FMN. A conserved histidine has been shown [4] to be involved in the removal of the proton. The region around this 
active site residue is highly conserved and contains an arginine residue which is involved in substrate binding. 
[0592] Consensus pattern: S-N-H-G-[AG]-R-Q [H is the active site residue] [R is a substrate-binding residue]- 

[ 1) Giegel D.A., Williams C.H. Jr., Massey V. J. Biol. Chem. 265:6626-6632(1990). 

[ 2] Tsou A.Y, Ransom S.C., Gerlt J.A., Buechter D.D.. Babbitt PC, Kenyon G.L Biochemistry 29 9856-9862 
(1990). 

[ 3] Le K.H.D., Lederer F J. Biol. Chem. 266:20877-20880(1991). 
[ 4] Lindqvist Y, Branden C.-L J. Biol. Chem. 264:3624-3628(1989). 

[0593] 1 95. Flavin-binding monooxygenase-like (FMO-like) 

[0594] This family includes FMO proteins, cyclohexanone monooxygenase 

[0595] 196. (FPGS) 

Folylpolyglutamate synthase signatures (aka Murjigase) 

[0596] Folylpolyglutamate synthase (EC 6.3.2.17) (FPGS) [1 ] is the enzyme of folate metabolism that catalyzes ATP- 
dependent addition of glutamate moieties to tetrahydrofolate. 

[0597] Its sequence is moderately conserved between prokaryotes (gene folC) and eukaryotes. We developed two 
signature patterns based on the conserved regions which are rich in glycine residues and could play a role in the 
catalytical activity and/or in substrate binding. 

[0598] Consensus pattern [UVMFY]-x-[LIVM]-[STAG]-G-T-[NK]-G-K-x-[ST]-x(7)- [LIVM](2)-x(3)-[GSK] Sequences 
known to belong to this class detected by the pattern ALL 

[0599] Consensus pattern[LIVMFY](2)-E-x-G-[LIVM]-[GA]-G-x(2)-D-x-[GST]-x-[LIVM](2) Sequences known to be- 
long to this class detected by the pattern ALL. 

[0600] [ 1] Shane B., Garrow T., Brenner A., Chen L, Choi Y.J., Hsu J.C., Stover P. Adv Exp Med Biol 338 629-634 
(1993). 

[0601] 1 97. FYVE zinc finger 

[0602] The FYVE zinc finger is named after four proteins that it has been found in: Fab1, YOTB/ZK632. 1 2, Vac1, 
and EEA1. The FYVE finger has been shown to bind two Zn++ ions [1]. The FYVE finger has eight potential zinc 
coordinating cysteine positions. Many members of this family also include two histidines in a motif R+HHC+XCG, where 
+ represents a charged residue and X any residue. Members were included which do not conserve these histidine 
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residues but are clearly related 
r F-actin capping protein beta subunrt signature 

[060BJ p|A™t™daJ.F..C«» r JF Tadwl^S^r™ ° * *"**««"» ".pattern ALL 
IP«J m „ ^.^^ ^ ^ A 

Isopemciltan N synthetase (IPNS) fl 21 is a kev B n™L k u • 
°™°<°xy9en,«remcves^ 

to form the azetidinone and thiazolidine rings of iSSt T '-^■r'^^'H-cysteinyW-valine 

Two cysteines are conserved in fungal and be2e«E2 t2Si^^ m0 °' ^ 330 a ™»*<* residues, 
substrate-binding. Pephaloeporlum in*^^ in ^'"^9 and/or 

biosynthesH, The DAOCS doma*, which is sTuSu^S 

etoxy-cephalosporin C - used as a substrate by DACS toS^?^??* 9 S,6p ,rWn penic, ' llin N ,0 Re- 
possesses a monofunctional DAOCS en^ (gene cefB iT^ZT^^" & ^"•^^^■geru. 
enzymes were der^ed. centered arour^r^^ 
[0610] Consensus partem: [RK)-x-[STAJ-x(2)-S-x-C-Y-rSLl- 
Consensus pattern: [UVMJ(2)-xO-G-{STA]-x(2HSTAGJ-x(2K-x- { DNGh 

[1J Martin J.F. Trends Btotechnol. 5:306-308(1987) 

S.W.. .ngolia T.D. OonJ^^^^S^ ' ^ ^ ^ WX ' Mi " er J R - Ql — 

r 4] Kovacevic S.. Weige, B.,, Tobin M.B., Ingoifc T.D.. Miller ,R. J. Bacterfcl. 171:754-760, 989). 
[0611 J 200. Fibrillar^ signature 

RNAs [2,. Fibrin is an extLeiy wTcTrv^ U8a " d "^mal^ear 

of three different domains: - An N-terminal domain of about 80 ami Sh? k k reS ' dU8S S,ruc,ural| y » consists 
a number of dimethylated arginine residues (S A central f ™ ^ ^ 9 * cine and c °^ns 

RNA-binding proteins and contains an *J2ZL£Z Tl^ZZ " W * feSemb,es ,hat - 

A C-terminal alpha-helical domain. A protein evoluS rebtS. ^ « t C °" Sensus found >" such proteins. - 
such as Methanococcus vannielii or voltae ThTs M^Sa^ ' 0Und [3 ' in ^chaebacteria 

GV/Arg-richN-terminaldomain. As a signature pa«e^^^^^ " P '^ RNA ^^9- It lacks the 

2 like octapeptide sequence. epanern, a regen was selected that starts with and encompasestheRNP- 

[0612] Consensus pattern: IMTHUVMAP^Y^^ 

[ 1] Aris J.P., Blobel G. Proc. Natl. Acad. Sci. USA 88-931-935M99H 

3 XT? R /' SWanSOn M S ' °^ uss G Gene! D ev 3^31 437(1989, 
[ 3] Agha-Amin K. J. Bacteriol. 176.2124-2127(1994). ^(1989). 

[0613] 201. FHamin/ABP280 repeat 

subgroups depend** upon the physiological nature of the iron sulfur 
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clusters) and according to sequence similarities. One of these subgroups are the 2Fe-2S f erredoxins, which are pro- 
teins or domains of around one hundred amino acid residues that bind a single 2Fe-2S iron-sulfur cluster. The proteins 
that are known (2) to belong to this family are listed below. - Ferredoxin from photosynthettc organisms; namely plants 
and algae where it is located in the chloroplast or cyanelle; and cyanobacteria. - Ferredoxin from archaebacteria of 
the Hatobacterium genus. - Ferredoxin IV (gene pftA) and V (gene fdxD) from Rhodobacter capsulatus. - Ferredoxin 
in the toluene degradation operon (gene xylT) and naphthalene degradation operon (gene nahT) of Pseudomonas 
putida. - Hypothetical Escherichia coli protein yfaE. - The N-terminal domain of the bifunctional ferredoxin/lerredoxin 
reductase electron transfer component of the benzoate 1 ,2-dioxygenase complex (gene benC) from Acinetobacter 
calcoaceticus, the toluene 4-monooxygenase complex (gene tmoF), the toluate 1 ,2-dk>xygenase system (gene xylZ), 
and the xylene monooxygenase system (gene xylA) from Pseudomonas. - The N-terminal domain of phenol hydrox- 
ylase protein p5 (gene dmpP) from Pseudomonas Putida. - The N-terminal domain of methane monooxygenase com- 
ponent C (gene mmoC) from Methylccoccus capsulatus . - The C-terminal domain of the vanillate degradation pathway 
protein vanB in a Pseudomonas species. - The N-terminal domain of bacterial fumarate reductase iron-sulfur protein 
(gene frdB). - The N-terminal domain of CDP-6-deoxy-3,4-glucoseen reductase (gene ascD) from Yersinia pseudotu- 
berculosis. - The central domain of eukaryotic succinate dehydrogenase (ubiquinone) iron- sulfur protein. - The N- 
terminal domain of eukaryotic xanthine dehydrogenase. - The N-terminal domain of eukaryotic aldehyde oxidase. In 
the 2Fe-2S ferredoxins, four cysteine residues bind the iron-sulfur cluster. Three of these cysteines are clustered 
together in the same region of the protein. Our signature pattern spans that iron-sulfur binding region. 
[0619] Consensus pattern: C-{CHC}-(GA]-{C}-C-[GAST]-{CPDEKRHFYW)-C [The three C*s are 2Fe-2S ligands)- 
[ t] Meyer J. Trends Ecol. Evol. 3:222-226(1 988).[ 2] Harayama S., Polissi A., Rekik M. FEBS Lett. 285:85-88(1991). 
[0620] Adrenodoxin family, iron-sulfur binding region signature (fer2B) 

Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide variety of metabolic reac- 
tions. Ferredoxins can be divided into several subgroups depending upon the physiological nature of the iron sulfur 
cluster(s) and according to sequence similarities. One family of ferredoxins groups together the following proteins that 
all bind a single 2Fe-2S iron-sulfur cluster: - Adrenodoxin (ADX) (adrenal ferredoxin), a vertebrate mitochondrial protein 
which transfers electrons from adrenodoxin reductase to cytochrome P450scc, which is involved in cholesterol side 
chain cleavage. - Putidaredoxin (PTX), a Pseudomonas putida protein which transfers electrons from putidaredoxin 
reductase to cytochrome P450-cam, which is involved in the oxidation of camphor. - Terpredoxin [2], a Pseudomonas 
protein which transfers electrons from terpredoxin reductase to cytochrome 

P450-terp t which is involved in the oxidation of alpha-terpineol. - Rhodocoxin [3], a Rhodococcus protein which transfers 
electrons from rhodocoxin reductase to cytochrome CYP116 (thcB). which is involved in the degradation of thiocar- 
bamate herbicides. - Escherichia coli ferredoxin (gene fdx) [4] whose exact function is not yet known. - Rhodobacter 
capsulatus ferredoxin VI [5], which may transfer electrons to a yet uncharacterized oxygenase. - Caulobacter crescen- 
tus ferredoxin (gene fdxB) [6]. In these proteins, four cysteine residues bind the iron-sulfur cluster. Three of these 
cysteines are clustered together in the same region of the protein. Our signature pattern spans that iron-sutf ur binding 
region. 

[0621] Consensus pattern: C-x(2)-[STAQ]-x-[STAMV]-C-[STA]-T-C-[HR) [The three C's are 2Fe-2S ligandsj- 
[ 1] Meyer J. Trends Ecol. Evol. 3:222-226(1988). 

[2] Peterson J.A., Lu J.-Y., Geisselsoder J., Graham-Lorence S., Carmona C, Witney F., Lorence M.C. J. Biol. 
Chem. 267:14193-14203(1992). 

[ 3J Nagy I., Schoofs G., Compernolle F., Proost P, Vanderleyden J., De Mot R. J. Bacteriol. 177:676-687(1995). 
[ 4) TaD.T, Vickery LE. J. Biol. Chem. 267:11120-11125(1992). 

[ 5] Naud I., Vincon M., Garin J., Gaillard J., Forest E., Jouanneau Y. Eur. J. Biochem. 222:933-939(1994). 
[ 6] Amemiya K EMBUGenbank: X51607. 

[0622] 204. 4Fe-4S ferredoxins, iron-sulfur binding region signature (fer4) 

Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide variety of metabolic reac- 
tions. Ferredoxins can be divided into several subgroups depending upon the physiological nature of the iron-sulfur 
clusters). One of these subgroups are the 4Fe-4S ferredoxins, which are found in bacteria and which are thus often 
referred as t>acterial-type' ferredoxins. The structure of these proteins [2J consists of the duplication of a domain of 
twenty six amino acid residues; each of these domains contains four cysteine residues that bind to a 4Fe-4S center 
A number of proteins have been found [3] that include one or more 4Fe-4Sbinding domains similar to those of bacterial- 
type ferredoxins. These proteins are listed below (references are only provided for recently determined sequences). - 
[0623] The iron-sulfur proteins of the succinate dehydrogenase and the fumarate reductase complexes (EC 1.3.99.1 ). 
These enzyme complexes, which are components of the tricarboxylic acid cycle, each contain three subunits: a flavo- 
protein, an iron-sulfur protein, and a b-type cytochrome. The iron- sulfur proteins contain three different iron-sulfur 
centers: a 2Fe-2S, a 3Fe-3S and a 4Fe-4S. - Escherichia coli anaerobic gtycerot-3-phosphate dehydrogenase (EC 
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■ 1 - 1 " 5 > This enzyme is composed of three subunits- »Banrim»r w ■ 

with two ferredoxin-like domains in the N-terminal part of L^foL P k k S6emS l ° ta an 

reductase. The B subunit of this enzyme (gen™ ^ "* dimet ^ sutt °*"e 

-Escherichia coli formate hydrocele T w^^ 

■> ^'on-suKurproteinsthateach^ 

dehydrogenase (EC 12^2). This enzyme is use^T^!^ 'T ^^"^^^'ormicicum formate 
dimeric enzyme probands two 4fL| clnte^ E^he^a^ , ? ^ °" ,0m,ate - The chain of »* 
The beta chain of these two enzymes (ge^fdnH a ndSlf ' rna,e ^"^^^es N andO (EC 1.2 1 2 ) 

domains. - Desuffovibrio PeriplaLiT^^^^^ 

' three 4Fe-4S centers, two of whfch a e ^'f " ,his di ™"c enzyme binds 

teriumtherrrrcautrophfcummetbyl^^ re 9'°" <* P^ein. - Methanobac- 

reductase (EC 1.8.1.-) [4J. Two of the subunits of this ^Z^Z » Sa^oneHa typhimurium anaerobic suffite 

centers. - A FerredoxinYke protein (gene ^ S^SSXl *" 7* ,0 ^ bind ,WO ^ 
and one from the Nif-region of Azot^cl^S f.*"* 5 , 10 "* of various Rhizobium species. 

psaC). mis protein contains two tow poJEKS cemer! ^SSSSi ' « ^ 

protem which is predicted to carry two 4F«MS centers ?S2S A ^ B CGn,ers - * 100 <*l°roplast "*B 
Entamobea histotytica. - EscheriL coli ^etSpr^^ mTJZ?^T, ""^ the en,erfe amoeba 
radical activating enzymes family (see 'mao^^^lS^Sl * ^ ***** to ,h * 
idues in the iron-sulfur region is suffkienT^SScSj^^ T, J h cente ' sT "« Pa«em of cysteine res- 

[0624] Consensus pattern: C-x(2)-C-x(2^S-cTp?r?rS.^ ~ '" 9 P '° ,eins - 

w o x(jj-c-[PEGJ (The four Cs are 4Fe-4S ligands]- 

[ 1] Meyer J. Trends Ecol. Evol. 3:222-226(1988) 
f 2] Otaka E., Ooi T. J. Mol. Evol. 26 257-267(1987) 
[ 3] Beinert H. FASEB J. 4:2483-2492(1 990) 

! 2 £Tn B " rrBB E L J - 173:1544-1553(1991) 
1 5 i Knaff D B. Trends Biochem. Sci. 13:460-461(1988). 

[0625] 205. NifH/frxC family signatures (fer4_NifH) 

^Si h -zsii , j r h 9ica ' nftro9en roca,fon * • 

of nitrogen to ammonia and component 2 £S£ heSo^ll r COn,a,nS ^ Site ,or *• redu <*°n 
ni.H)v*ichbindsasing.e4Fe^ 

such as ferredoxin; the reduced protein then tranSZt^S f,Xat,on P rocess " ,fH **« reduced by a protein 
ATP. A number of proteins are known to ^^^ ^ '^T 1 ^ *" COnCO,ni,an, '""sumption of 
(or chIL) protein [3]. FrxC is encoded on t» chE£2 Z^^^F*"!* "« " Chk>r0 ' 5,as, »«C 
butrtcouWactasane.ectroncarrierintheconve^p^^ 

protems bchL and bchX [4]. These proteins are also HkeK to » 2? X " d8, °f toro P h y" ld «- 'Rnodobactercapsulatus 

of conserved regions in the sequence of mLe P Sein S t h S tor0Phy " Synthesis - There are a number 

<P-loop)andinthecen, ral sec^ 

of the 4Fe-4S cluster. Two signatures patterns.^™ 

[0626] Consensus pattern: E-x-G-G VxSf^SSS sTr h 9 ™! aTOUnd mese Cvs,eines developed. 

Consensus pattern: r>x-L-G-D-V-V-C-G-G -F- AgL F^rcSlL " I" '^'^ ""^ 

1m^j-x-h [O binds the rron-suHur center]- 

[ 1] Pau R.N. Trends Biochem. Sci. 14:183-186(1989) 

1 4] Burke D.H.. Albert! M., Hears, J.E. J. Ic^S^^^. ^ "* "* 13:551 ^K1989). 
[0627] 206. Ferritin iron-binding regions signatures 

anTa 1 :;:^ 

animals me protein is mainly cyXopiasm^^e 'lZnZt ^ 5 * " a " aqueous environment In 

subunits (in mammals there are i ^s^s^Z^T " ^ ^ eTOOdeS ,or ctose ^ -'ated 

chtoropbst^ThereareanumberofwenclTrv^reo^^^^^^ 
signature patterns were selected. The firs, pTnern^S 
three conserved Qlutamate which are thoug^^^^^^ 
-rminalsec^^^ 
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ions can gain access to the central cavity of the molecule; this pattern also includes conserved acidic residues which 
are potential metal-binding sites. 

[0628] Consensus pattern: E-x-(KR]-E-x(2)-E-(KR]-[LFHUVMA}-x(2)-Q-N-x-R-x-G-R [The 3 E's are potential iron 
ligands]- 

Consensus pattern: D-x(2)-[LI VMF]-[STAC]-[DH}-F-[U}-[EN}-x(2HFYJ-L-x(6)-[U VM]-[KN] Pne second D and the E are 
potential iron ligands]- 

[ 1] Crichton R.R., Charloteaux-Wauters M. Eur. J. Biochem. 164:485-506(1987). 
[ 2) Theil E.C. Annu. Rev. Biochem. 56:289-315(1987). 

[ 3J Ragland M., Briat J.-F., Gagnon J., Laulhere J.-P, Massenet O., Theil E.C. J. Biol. Chem. 265:18339-18344 
(1990). 

[0629] 207. Intermediate filaments signature (filament) 

Intermediate filaments (IF) [1 ,2,3] are proteins which are primordial components of the cytoskeleton and the nuclear 
envelope. They generally form filamentous structures 8 to 1 4 nm wide. IF proteins are members of a very targe multigene 
family of proteins which has been subdivided in five major subgroups: - Type I: Acidic cytokeratins. - Type II: Basic 
cytokeratins. - Type Ml: Vimentin, desmin, glial fibrillary acidic protein (GFAP), peripheric and plasticin. - Type IV: 
Neurofilaments L, H and M, atpha-internexin and nestin. - Type V: Nuclear lamins A B1, B2 and C. All IF proteins are 
structurally similar in that they consist of: a central rod domain comprising some 300 to 350 residues which is arranged 
in coiled-coiled alpha-helices, with at least two short characteristic interruptions; a N-terminal non-helical domain (head) 
of variable length; and a C-terminal domain (tail) which is also non-helical, and which shows extreme length variation 
between different IF proteins. While IF proteins are evolutionary and structurally related, they have limited sequence 
homologies except in several regions of the rod domain. A conserved region at the C-terminal extremity of the rod 
domain was used as a sequence pattern for this class of proteins. 
[0630] Consensus pattern: [IV]-x-|TACIJ-Y-[RKH]-x-[LM)-L-[DE]- 

[ 1] Quinlan R, Hutchison C, Lane B. Protein Prof. 2:801-952(1995). 
[ 2) Steiner P.M., Roop D.R. Annu. Rev. Biochem. 57:593-625(1988). 
[ 3] Stewart M. Curr. Opin. Cell Biol. 2:91-100(1990). 

[0631] 208. Flavodoxtn signature 

Flavodoxins [1.E1J are electron-transfer proteins that function in various electron transport systems. Flavodoxins bind 
one FMN molecule, which serves as a redox-active prosthetic group. Flavodoxins are functionally interchangeable with 
ferredoxins. They have been isolated from prokaryotes, cyanobacteria, and some eukaryotic algae. The signature 
pattern for these proteins is derived from a conserved region in their N-terminal section, this region is involved in the 
binding of the FMN phosphate group. 

[0632] Consensus pattern: [LIV]-[LIVFYl-[FY]-x-[ST]-x(2)-[AGC]-x-T-x(3)-A-x(2)-[LIV]- 
[ 1] Wakabayashi S., Kimura K., Matsubara H., Rogers LJ. Biochem. J. 263:981-984(1989). 
[0633] 209. Growth factor and cytokines receptors family signatures (fn3) 

A number of receptors for rymphokines, hematopoeitic growth factors and growth hormone-related molecules have 
been found [1 to 5] to share a common binding domain. Receptors known to belong to this family are: - Cytokine 
receptor common beta chain. This chain is common to the IL-3, IL-5 and GM-CSF receptors. - Cytokine receptor 
common gamma chain. This chain is common to the IL-2, IL-4, IL-7 and IL-13 receptors. - Ciliary neurotrophic factor 
receptor (CNTFR). - Erythropoietin receptor (EPOR). - Granulocyte colony-stimulating factor receptor (G-CSFR). - 
Granulocyte-macrophage colony-stimulating factor receptor alpha chain (GM- CSFR). - lnterleukin-2 receptor beta 
chain (lL2R-beta). - lnterleukin-3 receptor alpha chain (IL3R). - lnterleukin-4 receptor alpha chain (IL4R). - Interleukin- 
5 receptor alpha chain (IL5R). - tnterleukin-6 receptor (IL6R). - lnterleukin-7 receptor alpha chain (IL7R). - Interleukin- 
9 receptor (IL9R). - Growth hormone receptor (GRHR). - Prolactin receptor (PRLR). - Thrombopoeitin receptor (TPOR). 
The conserved region constitutes all or part of the extracellular ligand-binding region and is about 200 amino acid 
residues long. In the N-terminal of this domain there are two pairs of cysteines known, in the growth hormone receptor, 
to be involved in disulfide bonds. + -XXXXXXX + I C C C C Extracel- 
lular XXXXXXX Cytoplasmic I+-I-I l-l XXXXXXX +IHI Transmembrane +- 

+ +-+ Two patterns to detect this family of receptors were used. The first one is derived from the first N-terminal disulfide 
loop, the second is a tryptophan-rich pattern located at the C-terminal extremity of the extracellular region. 
[0634] Consensus pattern: C-[LVFYR]-x(7,8)-(STIVDN]-C-x-W [The two C's are linked by a disulfide bond]- 
Consensus pattern: [STGL]-x-W-[SG]-x-W-S- 

[ 1] Bazan J.F. Biochem. Bbphys. Res. Commun. 164:788-795(1989). 
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[ 2J Bazan J.F. Proc. Natl. Acad. ScL U.S.A. 87:6934-6938(1990) 

[4J tfAndrea A.D., Fasman G.D., Lodish H.F. Cell 58:1023.1094/ 1080) 

[ 5] d Andrea A.D.. Fasman G.D., Lodish H.F. Curr. Opin. Cell Biol. 2:648*51(1990). 

Phosphonbosylgrycnamide formyltransferase (EC 212 2) fGARn rn L^lJT L , 

synthesis, the transfer of a formyl group to 5'Sh^riSS^ ' 1 ^ ^ * ,h,fd Step h de ™° P urine bi °- 
multifunctional enzyme polypeptl^caUSre^ ^^T' " T" euteW ' GART is P*" <* a 
yeast, Q^haiLI^pJES^Sl^S 7?k "» »«"* P^ts and 

acid residue has been shown to be inv^e f^U^^^"^ * e Esche ^ «>« enzyme, anaspartic 
writ conserved in GART from P^^csLe^^™ ^ ^ ™ S active s « e * 

'°rmyl»e,rarvdrofo.a.edeh^^ 

3 l *» dRS - WW" C J- Biol. Chm 266:496IM973M991> 

[0637] 211. G10 protein signatures 

[0642] 21 3^ Glucose-6-phosphate dehydrogenase active site (G6PD) 
Glucose-6-phosphate dehydrogenase (EC 1 i i 491 /fifipm n .u. «• 

reduction of g.ucose-6-phosphlte to fllStaSS fohShl i S ' 6P * pentose P 8 "™"* 

nucleophile associated with the actiiTthH^^^ ^' ne ° ^ ""^ iden,if,ed 38 are ac,ive 

bacjera, ,0 mammal G6PD's and I LT^sTs^^Z ^ " ^ «"-"'- «™ 

£2 f lT J T US 1 P T m: D " H - Y -'- G - | <-[EQ'<]IK is the actrve site residuej- 

- GATA-1 11] (also known as E^l S or NF El 1 ^hLh iv rt """""V to belong ,0 this family are: 

expressed in erythroid celts. „ K ^ °' ^ 96068 ** ° ,he ' 9 e "<* 

roid development. - GATA-2 [21 a tS^S IT 8e,ve8 35 a 96nera ' ' swi,ch ' fac '°< «» eryth- 

ceHs. - GATA-3 [3]. a t«««3^S2nS^^ T*" T^" 1 6Xpr6SSi0n in e "^' 
" GATA-4 [4], a transcrpti^ 

pannier (or DGATAa) (gene pnr) which ^alTl^e^T^l T ^ ' DrOSOphi,a P'° ,ei " 
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section of these enzymes has been selected as a signature pattern for this family of TKA. 

Consensus pattern: [GA]-x(1.2)-[DE]-x-Y-x-[STAP]-x-C-[NKR]-x-[CH]-[LIVMFYWH] [ 1] Boyle D.B., Coupar B.E.H., 
Gibbs A. J., Seigman LJ., Both G.W. Virology 156:355-365(1 987 ).[ 2] Blasco R, Lopez-Otin C, Munoz M., Bockamp 
E.-O., Simon-Mateo C, Vinuela E. Virology 178:301 -304(1 990).[ 3] Robertson G.R., Whaltey J.M Nucleic Acids Res 
16:11303-11317(1988). 

[1 523] 641 . Thymidine kinase Irom herpesvirus (TK herpes) 
[1] 

Medline: 96003730 

Crystal structures of the thymidine kinase from herpes simplex virus type-1 in complex with deoxythymidine and gan- 
ciclovir. 

Brown DG. Visse R, Sandhu G, Davies A, Rizkallah PJ, Melitz 
C, Summers WC, Sanderson MR; 
Nat Struct Biol 1995;2:876-881. 
Number of members: 65 

[1524] 642. Nuclear transition protein 2 signatures (TP2) 

In mammals, the second stage of spermatogenesis is characterized by the conversion of nucleosomal chromatin to 
the compact, non-nucleosomal and transcriptionally inactive form found in the sperm nucleus. This condensation is 
associated with a double-protein transition. The first transition corresponds to the replacement of histones by several 
spermatid-specific proteins, also called transition proteins, which are themselves replaced by protamines during the 
second transition. Nuclear transition protein 2 (TP2) is one of those spermatid-specific proteins. TP2 is a basic, zinc- 
binding protein [1] of 116 to 137 amino-acid residues. Structurally, TP2 consists of three distinct parts: a conserved 
serine-rich N-terminal domain of about 25 residues, a variable central domain of 20 to 50 residues which contains 
cysteine residues, and a conserved C-terminal domain of about 70 residues rich in lysines and arginines. Two signature 
patterns for TP2 have been developed: one located in the N-terminal domain, the other in the C-terminal. 
Consensus pattern: H-x(3)-H-S-[NS]-S-x-P-Q-S 
Consensus pattern: K-x-R-K-x(2)-E-G-K-x(2)-K-[KR]-K 

[1] Baskaran R., RaoM.R.S. Biochem. Biophys. Res. Commun. 179:1491-1499(1991). 
[1525] 643. Thiamine pyrophosphate enzymes signature (TTP enzymes) 

A number of enzymes require thiamine pyrophosphate (TPP) (vitamin B1) as a cof actor It has been shown [1] that 
some of these enzymes are structurally related. These related TPP enzymes are: - Pyruvate oxidase (POX) (EC 1.2.3.3) 
Reaction catalyzed: pyruvate + orthophosphate + 0(2) + H(2)0 = acetyl phosphate + CO(2) + H(2)0(2). - Pyruvate 
decarboxylase (PDC) (EC 4.1.1.1 ) Reaction catalyzed: pyruvate = acetaldehyde + CO(2). - Indolepyruvate decarbox- 
ylase (EC 4. 1.1. 74) [2] Reaction catalyzed: indole-3-pyruvate = indote-3-acetaldehyde + CO(2). - Acetolactate synthase 
(ALS) (EC 4.1.3.18) Reaction catalyzed: 2 pyruvate = acetolactate + CO(2). - Benzoylformate decarboxylase (BFD) 
(EC 4.1.1.7) [3] Reaction catalyzed: benzoylformate = benzaldehyde + CO(2). A conserved region which is located in 
their C-terminal section has been selected as a signature pattern for these enzymes. 
Consensus pattern: [LIVMF]-[GSAl-x(5)-P-x(4)-[LIVMFYW]-x-[UVMFl-x-G-D-[GSAJ-[GSAC] 

[ 1] Green J.B.A. FEBS Lett. 246: 1-5(1 98 9). [ 2] Koga J., Adachi T„ Hidaka H. Mol. Gen. Genet. 226:10-16(1 991 ).[ 3] 
Tsou A.Y., Ransom S.C., Gerlt J.A., Buechter D.D., Babbitt P.C., Kenyon G.L Biochemistry 29:9856-9862(1990) 
[1526] 644. TPR Domain 
11] 

Medline: 95397415 

Tetratrico peptide repeat interactions: to TPR or not to TPR? 
Lamb JR, Tugendreich S, Hieter P; 
Trends Biochem Sci 1995;20:257-259. 
[2]Medline: 98151343 

The structure of the tetratricopeptide repeats of protein phosphatase 5: implications for TPR-mediated protein -protein 
interactions. 

Das AK, Cohen PW, Barford D; 

EMBO J 1998;17:1192-1199. / 
Number of members: 621 

[1527] 645. Uroporphyrin-lll C-methyltransferase signatures (TP methylase) 

Uroporphyrin-MI C-methyltransferase (EC 2.1.1.107 ) (SUMT) [1 .2] catalyzes the transfer of two methyl groups from S- 
adenosyl-L-methbnine to the C-2 and C-7atoms of uroporphyrinogen III to yield precorrin-2 via the intermediate for- 
mation of precorrin-1 . 

SUMT is the first enzyme specific to the cobalamin pathway and precorrin-2 is a common intermediate in the biosyn- 
thesis of corrinoids such as vitamin B12, siroheme and coenzyme F430.The sequences of SUMT from a variety of 
eubacterial and archaebacterial species are currently available. In species such as Bacillus megaterium (gene cobA). 
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domains 8 and 9. Two patterns have been developed to detect this family of proteins. The first pattern is based on the 
G-R-[KR] motif; but because this motif is too short to be specific to this family of proteins, a pattern from a larger region 
centered on the second copy of this motif was derived. The second pattern is based on a number of conserved residues 
which are located at the end of the fourth transmembrane segment and in the short loop region between the fourth 
and fifth segments. 

Consensus pattern: [LIVMSTAGHLIVMFSAGJ-x(2HLIVMSA]-{DE]-x-[LIVMFYWA]-G- R-[RK]-x(4,6)-[GSTAJ 
Consensus pattern: [LIVMFJ-x-G-[LIVMFA]-x(2)-G-x(8)-[LIFY]-x(2HEQ]-x(6)- [RKJ 

[ 1} Silverman M. Annu. Rev. Biochem. 60:757-794(1 991 ).( 2) Gould G.W.. Bell G.I. Trends Biochem. Sci. 15:18-23 
(1990).[ 3] Baldwin S.A. Biochtm. Biophys. Acta 11 54: 17-49(1 993).( 4) Maiden M.C.J., Davis E.O., Baldwin S.A., Moore 
D.C.M., Henderson P.J.F. Nature 325:641 h643(1 987). [ 5] Henderson P.J.F Curr. Opin. Struct. Biol. 1:590-601(1991). 
[6] Culham D.E., Lasby B., Marangoni A.G., Milner J.L, Steer B.A.. van IMues R.W., Wood J M J Mol Biol 229 
268-276(1993). 

[1515] 633. Synaptobrevin signature 

Synaptobrevin [1] is an intrinsic membrane protein of small synaptic vesicles whose function is not yet known, but 
which is highly conserved in mammals t electric ray (where its is known as VAMP-1 ) t Drosophila and yeast (2J. In yeast 
there are two closely related forms of synaptobrevin (genes SNC1 andSNC2) while in mammals there is at least 4 
(genes SYB1 , SYB2, SYB3 and SYBL1 ^Structurally synaptobrevin consist of a N-terminal cytoplasmic domain of from 
90 to 110 residues, followed by a transmembrane region, and then by a short (from 2 to 22 residues) C-terminal intra- 
vesicular domain. As a signature pattern for synaptobrevin, a highly conserved stretch of residues located in the central 
part of the sequence was selected. 

Consensus pattern: N-[LIVMHDENSHKL]-V-x-{DEQ]-R-x(2HKRHLIVM]-[STDE}- x-[LIVM]-x-[DE]-[KRJ-fTA]-[DEJ 
[ 1] Suedhof T.C., Baumert M., Perin M.S., Jahn R. Neuron 2:1475-1481 (1989).[ 2] Gerst J.E., Rodgers L, Riggs M., 
Wigler fvj. Proc. Natl. Acad. Sci. U.S.A. 89:4338-4342(1992). 

[1516] 634. TBC domain. Identification of a TBC domain in GYP6_YEAST and GYP7_YEAST, which are GTPase 
activator proteins of yeast Ypt6 and Ypt7 t imply that these domains are GTPase activator proteins of Rab-like small 
GTPases. Number of members: 55 

[1] Medline: 96032578. Molecular cloning of a cDNA with a novel domain present in the tre-2 oncogene and the 
yeast cell cycle regulators BUB2 and cdc16. Richardson PM, Zon LI; Oncogene 1995;11:1139-1148. 
[2]Medline: 97398935. A shared domain between a spindle assembly checkpoint protein and Ypt/Rab-specific 
GTPase-activators. Neuwald AF; Trends Biochem Sci 1997;22:243-244. 

[1517] 635. Transcription factor TFIID repeat signature (TBP) 

Transcription factor TFIID (or TATA-binding protein, TBP) [1 ,2] is a general factor that plays a major role in the activation 
of eukaryotic genes transcribed by RNA polymerase II. TFIID binds specifically to the TATA box promoter element 
which lies close to the position of transcription initiation. There is a remarkable degree of sequence conservation of a 
C-terminal domain of about 180 residues in TFIID from various eukaryotic sources. This region isnecessary and suf- 
ficient for TATA box binding. The most significant structural feature of this domain is the presence of two conserved 
repeats of a 77 amino-acid region. The intramolecular symmetry generates a saddle-shaped structure that sits astride 
the DNA [3]. Drosophila TRF (TBP-related factor) [4] is a sequence-specific transcription factor that also binds to the 
TATA box and is highly similar to TFIID. Archaebacteria also possess a TBP homolog [5J. A signature pattern that 
spans the last 50 residues of the repeated region has been derived - 
Consensus pattern: Y-x-P-x(2)-JIF].x(2)-[LIVM](2)-x-[KRH]-x(3)-P-IRKQ]-x(3)- L-[LIVM]-F-x-ISTN]-G-[KR]-[LIVM]-x 
(3)-G-[TAGLHKR]-x(7)- [AGC]-x(7)-[LIVM ( 1] Hoffmann A.. Sinn E, Yamamoto T., Wang J., Roy A., Horikoshi M., 
Roeder R.G. Nature 346:387-390(1 990).[ 2] Gash A., Hoffmann A., Horikoshi M., Roeder R.G., Chua N.-H. Nature 
346:390-394(1990).[ 3] Nikolov D.B., Hu S.-H., Lin J. t Gasch A., Hoffmann A.. Horikoshi M.. Chua N.-H., Roeder R 
G., Burley S.K. Nature 360:40-46(1 992).[ 4] Crowley T.E., Hoey T.. Liu J.-K., Jan Y.N., Jan LY, Tjian R. Nature 361: 
557-561 (1993).[ 5] Marsh T.L, Reich C.I., Whitelock R.B., Olsen G.J. Proc. Natl. Acad Sci USA. 9V4180-4184 
(1994). 

[1518] 636. Translationally controlled tumor protein signatures (TCTP) 

Mammalian translationally controlled tumor protein (TCTP) (or P23) is a protein which has been found to be preferen- 
tially synthesized in cells during the early growth phase of some types of tumor [1 ,2], but which is also expressed in 
normal cells. The physiological function of TCTP is still not known. It is a hydrophilic protein of 18 to 20 Kd. Close 
homologs have been found in plants [3], earthworm [4], Caenorhabditis elegans (F52H2.11), Hydra, budding yeast 
(YKL056c) [5) and fission yeast (SpAC1 F12.02c) Two of the best conserved regions have been selected as signature 
patterns for TCTP. 

Consensus pattern: [IFA]-[GA]-IGAS]-N-[PAK]-S.[GA]-E-IGDE]-[PAGEHDEQGA] 

Consensus pattern: [FLVH]-{FY]-[IVCT]-G-E-x-[MAl-x(2,5)-[DEN]-[GAST]-x-[LVHAV).x(3).[FYWJ 
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Consensus pattern: [GSJ-x-[LIVMn-x(2)-A-[DNEQASHHGNEiq^-[STlMHLIVMFYl(3)-(DEHEKHLIVM] 
Consensus pattern: [FYW]-P-[GS)-N-[LIVM]-R-[EQ]-L-x-(NHAT] 

[ 1] Morrett E.. Segovia L J. Bacteriol. 175:6067-6074(1 993).[ 2] Austin S., Kundrot C. Dixon R Nucleic Acids Res 
19:2281-2287(1991).[3J Albright L.M., Huala E., Ausubel F.M. Annu. Rev. Genet. 23:311-336(1989) [ 4) Austin S 
Dixon R. EMBO J. 11:2219-2228(1992). 
[1506] 625. Sigma-70 factors family signatures 

Sigma factors [1] are bacterial transcription initiation factors that promote the attachment of the core RNA polymerase 
to specific initiation sites and arethen released. They alter the specificity of promoter recognition. Most bacteria express 
a multiplicity of sigma factors. Two of these factors, sigma-70 (gene rpoD), generally known as the major or primary 
sigma factor, and sigma-54 (gene rpoN or ntrA) direct the transcription of a wide variety of genes. The other sigma 
factors, known as alternative sigma factors, are required for the transcription of specific subsets of genes. With regard 
to sequence similarity, sigma factors can be grouped into two classes: the sigma-54 and sigma-70 families. The sigma- 
70 family includes, in addition to the primary sigma factor, a wide variety of sigma factors, some of which are listed 
below - Bacillus sigma factors involved in the control of sporulation-specific genes: sigma-E (sigE or spollGB), sigma- 
F (sigF or spollAC), sigma-G (sigG or spolllG), sigma-H (sigH or spoOC) and sigma-K (sigK or spolVCB/spolllC). - 
Escherichia coli and related bacteria sigma-32 (gene rpoH or htpR) involved in the expression of heat shock genes. - 
Escherichia coli and related bacteria sigma-27 (gene fliA) involved in the expression of the flagellin gene. - Escherichia 
coli sigma-S (gene rpoS or katF) which seems to be involved in the expression of genes required for protection against 
external stresses. - Myxococcus xanthus sigma-B (sigB) which is essential for the late-stage differentiation of that 
bacteria. Alignments of the sigma-70 family permit the identification of four regions of high conservation [2,3]. Each of 
these four regions can in turn be subdivided into a number of sub-regions. Signature patterns based on the two best- 
conserved sub-regions have been developed. The first pattern corresponds to sub-region 2.2;the exact function of this 
sub-region is not known although it could be involved in the binding of the sigma factor to the core RNA polymerase. 
The second pattern corresponds to sub-region 4. 2 which seems to harbor a DNA-btnding 'helix-turn-helix* motif involved 
in binding the conserved -35region of promoters recognized by the major sigma factors. The second pattern starts one 
residue before the N-terminal extremity of the HTH region and ends six residues after its C-terminal extremity 
Consensus pattern: [DEHLIVMF](2)-[HEQS]-x-G-x-[LW 

Consensus pattern: [STN]-x(2)-[DEQ]-[LI\^]-[GAShx(4)-[LIVMr^[PSTGhx(3)-|LTVMA]-x-[NQRHLIVMAl-[EQH]-x 
(3)-[LIVMFW]-x(2)-[LIVM] 

[ 1] Helmann J.D., Chamberlin M.J. Annu. Rev. Biochem. 57:839-872(1 988). [ 2] Gribskov M., Burgess R.R. Nucleic 
Acids Res. 14:6745-6763(1 986).[ 3] Lonetto M.A., Gribskov M., Gross CA J. Bacteriol. 174:3843-3849(1 992). [4] 
Lonetto M.A., Brown K.L, Rudd K.E., Buttner MJ. Proc. Natl. Acad. Sci. U.S.A. 91:7573-7577(1994). 
[1507] 626. Signal carboxyl-terminal domain. 430 members. 
[1508] 627. Signal peptidases I signatures 

Signal peptidases (SPases) [1] (also known as leader peptidases) remove the signal peptides from secretory proteins. 
In prokaryotes three types of Spases are known: type I (gene lepB) which is responsible for the processing of the 
majority of exported pre-proteins; type II (gene Isp) which only process lipoproteins, and a third type involved in the 
processing of pili subunits. SPase I is an integral membrane protein that is anchored in the cytoplasmic membrane by 
one (in B. subtilis) or two (in E. coli) N-terminal transmembrane domains with the main part of the protein protuding in 
the periplasmic space. Two residues have been shown [2,3] to be essential for the catalytic activity of SPase I: a serine 
and an lysine.SPase I is evolutionary related to the yeast mitochondrial inner membrane protease subunit 1 and 2 
(genes IMP1 and IMP2) which catalyze the removal of signal peptides required for the targeting of proteins from the 
mitochondrial matrix, across the inner membrane, into the inter-membrane space [4]. In eukaryotes the removal of 
signal peptides is effected by an oligomeric enzymatic complex composed of at least five subunits: the signal peptidase 
complex (SPC). The SPC is located in the endoplasmic reticulum membrane. Two components of mammalian SPC, 
the 18 Kd (SPC 18) and the 21 Kd (SPC21) subunits as well as the yeast SEC11 subunit have been shown [51 to share 
regions of sequence similarity with prokaryotic SPases I and yeast IMP1/IMP2. Three signature patterns for these 
proteins have been developed. The first signature contains the putative active site serine, the second signature contains 
the putative active site lysine which is not conserved in the SPC subunits, and the third signature corresponds to a 
conserved region of unknown iological significance which is located in the C-terminal section of all these proteins. 
Consensus pattern: [GS]-x-S-M-x-[PS]-[AT]-[LF] [S is an active site residue] 

Consensus pattern: K-R-[LIVMSTA](2)-G-x-[PG]-G-[DE]-x-[LI VM]-x-fUVMFY] [K is an active site residue] 
Consensus pattern: [LIVMFYW](2)-x(2)-G-D-[NH]*x(3)-[SND]-x(2HSG] 

[ 1] Dalbey R.E., von Heijne G. Trends Biochem. Sci. 17:474^478(1992).[ 2] Sung M., Dalbey R.E. J. Biol Chem 267* 
1 31 54-131 59(1 992).[ 3] Black M.T. J. Bacteriol. 1 75:4957-4961 (1 993).[ 4] Nunnari J., Fox T D. , Walter P. Science 262: 
1 997-2004(1 993).[ 5) van Dijl J.M., de Jong A.. Vehmaanpera J.. Venema G., Bron S. EMBO J. 11:2819-2828(1992). 
[6] Rawiings N.D., Barrett A.J. Meth. EnzymoL 244:1 9-61 (1994).[E1] 
[1509] 628. (sodcu) Copper/Zinc superoxide dismutase signatures 
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[1505] 624. Sigma-54 interaction domain signatures and profile 
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Consensus pattern: [LIVMFy](3)-x-G-[DEQ]-[STE]-G-fSTAV]-G-K-x(^HLIVM^ P 
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[1498] 620. Protein secY signatures 

The eubacterial secY protein (1) plays an important role in protein export. It interacts with the signal sequences of 
secretory proteins as well as with two other components of the protein translocation system: secA and secE. SecY is 
an integral plasma membrane protein of 419 to 492 amino acid residues that apparently contains ten transmembrane 
segments. Such a structure probablyconfers to secY a translocator 4 function, providing a channel for periplasmic and 
outer-membrane precursor proteins. Homotogs of secY are found in archaebacteria [2]. SecY is also encoded in the 
chloroplast genome of some algae [3] where it couJd be involved in a prokaryotic-like protein export system across the 
two membranes of the chloroplast endoplasmic reticulum (CER) which is present in chromophyte andcryptophyte algae. 
Two signature patterns have been developed for secY proteins. The first corresponds to the second transmembrane 
region, which is the most conserved section of these proteins. The second spans the C-terminal part of the fourth 
transmembrane region, a short intracellular loop, and the N-terminal part of the fifth transmembrane region. 
Consensus pattern: [GST]-{LIVMF](2)-x-[LI\M]-G-[LIVM]-x-P-[LiW^ 
(2) 

Consensus pattern: [LiVMFYVV](2)-x-[DEhx-[LIVMFH^^ 

[ 1]ltoK. Mol. Microbiol. 6:2423-2428(1 992).[ 2] Auer J., SpickerG., Boeck A. Biochimie 73:683-688(1 991 ).[ 3] Douglas 
S.E. FEBS Lett. 298:93-96(1992). 

[1 499] 621 . (Seed protein) Small hydrophilic plant seed proteins signature. The following small hydrophilic plant seed 
proteins are structurally related: - Arabidopsis thaliana proteins GEA1 and GEA6. - Cotton late embryogenesis abun- 
dant (LEA) protein D-19. - Carrot EMB-1 protein. - Barley LEA proteins B19.1A, B19.1B, B1 9.3 and B1 9.4. - Maize late 
embryogenesis abundant protein Emb564. - Radish late seed maturation protein p8B6.-Rice embryonic abundant 
protein Emp1. - Sunflower 10 Kd late embryogenesis abundant protein (DS10). - Wheat Em proteins. These proteins 
contains from 83 to 153 amino acid residues and may play a role[1,2) in equipping the seed for survival, maintaining 
a minimal level of hydration in the dry organism and preventing the denatu ration of cytoplasmic components. They 
may also play a role during imbibition by controlling water uptake. As a signature pattern, the best conserved region 
in the sequence of these proteins has been developed, it is a glycine-rich nonapeptide located in the N-terminal section.- 
[1500] Consensus pattern: G-[EQ]-T-V-V-P-G-G-T- 

[1501] [ 1] Dure L. Ill, Crouch M., Harada J., Ho T.-H. D., Mundy J., Quatrano R., Thomas T., Sung Z.R. Plant Mol. 
Biol. 12:475-486(1989).[ 2} Gaubier P., Raynal M., Hull G., Huestis G.M.. Grellet F., Arenas C, Pages M., Delseny M. 
Mol. Gen. Genet. 238:409-418(1993). 
[1502] 622. Serine carboxypeptidases, active sites 

All known carboxypeptidases are either metalto carboxypeptidases or serinecarboxypeptidases. The catalytic activity 
of the serine carboxypeptidases, like that of the trypsin family serine proteases, is provided by a charge relay system 
involving an aspartic acid residue hydrogen-bonded to a histidine, which is itself hydrogen-bonded to a serine [1]. 
Proteins known to be serine carboxypeptidases are: - Barley and wheat serine carboxypeptidases I, II, and III [2]. - 
Yeast carboxypeptidase Y (YSCY) (gene PRC1), a vacuolar protease involved in degrading small peptides. - Yeast 
KEX1 protease, involved in killer toxin and alpha-factor precursor processing. - Fission yeast sxa2, a probable carbox- 
ypeptidase involved in degrading or processing mating pheromones [3]. - Penicillium janthinellum carboxypeptidase 
S1 [4]. - Aspergullus niger carboxypeptidase pepF - Aspergullus satoi carboxypeptidase cpdS. - Vertebrate protective 
protein / cathepsin A [5], a lysosomal protein which is not only a carboxypeptidase but also essential for the activity of 
both beta-galactosidase and neuraminidase. - Mosquito vitellogenic carboxypeptidase (VCP) [6]. - Naegleria fowieri 
virulence-related protein Nf314 [7]. - Yeast hypothetical protein YBR139w. - Caenorhabditis elegans hypothetical pro- 
teins C08H9.1 , F1 3D12.6, F32A5.3, F41C3.5 and K10B2.2This family also includes: - Sorghum (s)-hydroxymandelo- 
nitrile lyase (hydroxynitrile lyase) (HNL) [8], an enzyme involved in plant cyanogenesis. The sequences surrounding 
the active site serine and histidine residues are highly conserved in all these serine carboxypeptidases. 
Consensus pattern: [LIVM]-x-[GTA]-E-S-Y-[AG]-{GS] [S is the active site residue] 

Consensus pattern: (LIVF]-x(2)-[LIVSTA]-x-{IVPST]-x-(GSDNQL]-[SAGV]-[SG]-H-x-[IVAQ]-P-x(3)-[PSA] [H is the ac- 
tive site residue] 

[ 1] Liao D.I., Remington S.J. J. Biol. Chem. 265:6528-6531 (1990).[ 2] Sorensen S.B., Svendsen I., Breddam K. 
Carlsberg Res. Commun. 54: 1 93-202(1 989).[ 3) Imai Y, Yamamoto M. Mol. Cell. Biol. 12:1 827-1 834(1992).[ 4] Sv- 
endsen I., Hofmann T., Endrizzi J., Remington J., Breddam K. FEBS Lett. 333:39-43(1 993).( 5] Galjart N.J., Morreau 
H., Willemsen R., Gillemans N., Bonten E.J., d'Azzo A. J. Biol. Chem. 266: 14754- 14762(1 991 ).[ 6] Cho W.L. Deitsch 
K.W., Raikhel A.S. Proc. Natl. Acad. ScL U.S.A. 88:10821 -10824(1991 ).[ 7] Hu W.N., Kopachik W„ Band R.N. Infect. 
Immun. 60:241 8-2424(1 992).( 8] Wajant K, Mundry K.W., Pfitzenmaier K. Plant Mol. Biol. 26:735-746(1 994).[ 9] Rawt- 
ings N.D., Barrett A.J. Meth. Enzymol. 244:1 9-61(1 994). [E1] 

[1503] 623. Serpins signature. Serpins (SERine Proteinase INhibitors) (1, 2,3,4) are a group of structurally related 
proteins. They are high molecular weight (400 to 500 amino acids),extracellular, irreversible serine protease inhibitors 
with a well defined structural-functional characteristic: a reactive region that acts as a 'bait' for an appropriate serine 
protease. This region is found in the C-terminal part of these proteins. Proteins which are known to belong to the serpin 
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transferase family Glycos transf i ^ ""^ °' ,he sucrose belongs to the glycosy. 

[1490] 614. Sulfotransferase proteins 
Number of members: 59 

[1491] 615. Synaptophysin / synaptoporin signature 

r^sr^ 

N- and C-termini sequences seem™ be ^27^?,? T membrane tour ** ,heir 
highly conservedregion located in thebe 9 ^Th2tSS S £i«,tr ? ** ** ^ °' pr ° ,eins ' a 

has been selected This region contains a cv^J™ r« w, ! h ? ! rloo Pi us «afterthe first transmembrane domain 
Consensus pattern: L-S-vK 55 T £ „T« hf ? ?■ ™ V * Mf * m a disuffide 
Scherer H.. Ltz H. NeuroJ W^MO) " ' 1 11 P " Ma ^ e ^ouey B.. 

[1492] 616. Syndecans signature 

variety o, mdecul/s v* **** ™* 

proteins consist of four separate domains- a) AsimalTe^^ m » T * 88 <xwece P« 0f s. Structurally, these 
length and whosesequenc^ is not ev^ La^ 

the sites of attachment of the heparan sulfate m^mlooi^ciH i ^yndecans. The ectodomain contains 
conserved cytoplasmic d^ainoTaS^ 

known to belong to this family are- - Syndecan i <Z^f^T?* cy,oskeletal W™*- The proteins 
syndecan. - Syndecan 4 or amph^canTry^dcin ^ll^^r ' 3 ° f neUro ^ can ° r N " 

decan (F57C7.3).The signature pattern thaT ha^L J? ' ' Caenomabdi «s elegans probable syn- 

transmembrane region and incLe "^e 2? 10 Lfd " ^ P ^ , *° bst fesidue * 

basic residues, cou.d act as a Z ?rTnsSe This re 9 ion ' *** four 

Consensus pattern: [FY]-R-[IMHKR]-K(2)-D-E-G-S-Y 

Sfi^^KSS^Sj 1 - Ga,t,RL ' taeEJ - A ™ °"« — » 

[1493] 617. Syntaxin / epimorphin family signature 

The following proteins have been shown to be evolutionary related 11 ? <n- cv k- , 

mesenchymal protein which plays an essential lotehJS^SS^^ * Ep,n,0,ph,n < or svntaxi n 2). a mammalian 
HPC-1 ) and syntaxin 1 B which are •ynSn^S^^TT?* " 1 A ( also as antigen 

napticactivezones. -Syntax* 3. - sJS£^^££S? " ° f Synaptic vesicles at W 

active zone, - Syntaxi), 5. which melT^n^S 

mtracellular vesicle trafficking. - Syntaxin 7 -^SSi1^f^^? n8pCrt - " Syn,3Xin 6 ' which fe invo,ved * 
to the vacuole. - Yeast SED5 whic ^is^ Jred tlTl w ( ? S6) Wh,ch ,S r0quired ,or ,he traf *Port of proteases 
andSSOawhicharereauired,™;"^.^ Z ^^""^"^^^^com* ex. -Yeast SS01 
assembr, - Arabic^ tha.iana'c^ 

hypothetical proteins F35C8.4, F48F7 2 F55A11 2 and ToT^ii ? Cytoklnesis - " Caenorhabditis elegans 
istics: a size ranging f rom30 Kd to 40 Kd- «C^ mi n a Tlv. J J? * pro,eins share 106 **"*«g character- , 
in anchors the prolein to^ " ^ ***** — is P«A involved 

mation. The pattern specr K for this famX fetes^^ 

Consensus pattern pa-jS SfSSUSf ™!f I* ° f 9,8 ""^ °°" domai "- 

[1494] 618. Sm protein 

the major splfceosomal small nuclear RlJte^^SZ^^^!^ ^ ^ Sm Ste Pfesent in fouf °« 
and Sm2, separated by a short variable linter COmmon ^ uence ^« h two segments. Sml 

1 999,-96:37^17. ' OU " 9 R ' M ' * !a Forte!te E - ^ VA. Luhrmann R, Li J, Nagai K; Cel. 

[1496] 619. Skpl family 

[1497] [1] Stebbins CE. Kaelin WG Jr. Pavletich NP; Science 1999;284:455-461. 
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[1460] [1 ] The three-dimensional structure of canavalin from jack bean (Canavalia ensiformis). Ko TR Ng JD, McPher- 
son A; Plant Physiol 1 993; 101 :729-744. 

[1481] 608. Aspartate-semialdehyde dehydrogenase signature 

Aspartate-semialdehyde dehydrogenase (ASD) catalyzes the second step in the common biosynthetic pathway leading 
from Asp to diaminopimelate and Lys, to Met, and to Thr; the NADP-dependent reductive dephosphorylation of L- 
aspartyl phosphate to L-aspartate-semialdehyde. In bacteria and fungi, ASDis a protein of about 40 Kd (340 to 370 
residues) whose sequence is not extremely well conserved [1 J. A conserved cysteine residue has been implicated as 
important for the catalytic activity [2].The region of conservation around the active site residue is too small to be used 
as signature pattern. Another more conserved region, located in the last third of the sequence, and which contains 
both a conserved cysteine as well as an histidine has been used instead. 
Consensus pattern: [LIVM]-[SADN]-x(2)-C-x-R-[LIVM]-x(4)-[GSC]-H-[STA 

[ 1] Baril C. Richaud C, Foumi E., Baranton G., Saint Girons I. J. Gen. Microbiol. 138:47-53(1 992). [2] Karsten W.E., 
Viola R.E. Biochim. Biophys. Acta 1121:234-238(1992). 

[1482] N-acetyl-gamma-glutamyl-phosphate reductase active site N-acetyl-gamma-glutamyl-phosphate reductase 
(EC 1.2.1.38 ) (AG PR) [1 ,2) is the enzyme that catalyzes the third step in the biosynthesis of arginine from glutamate, 
the NADP-dependent reduction of N-acetyl-5-glutamyl phosphate into N-acetylglutamate 5-semialdehyde.ln bacteria 
it is a monofunctional protein of 35 to 38 Kd (gene argC) while in fungi it is part of a bifunctional mitochondrial enzyme 
(gene ARG5.6, argil orarg-6) which contains a N-terminal acetylglutamate kinase (EC 2.7.2.8 ) domain and a C-terminal 
AGPR domain. In the Escherichia coli enzyme, a cysteine has been shown to be implicated in the catalytic activity, the 
region around this residue is well conserved and can be used as a signature pattern. 

Consensus pattern: [LIVM]-[GSA]-x-P-G-C-[FY]-[AVP]-T-[GA]-x(3)-[GTAC]-[LIVM]-x-P [C is the active site residue] 

[ 1] Ludovice M., Martin J.R, Carrachas P., Liras P. J. Bacterid. 174:4606-461 3(1 992 ).[ 2] Gessert S.F., Kim J.H., 

Nargang F.E., Weiss R.L J. Biol. Chem. 269:8189-8203(1994). 

[1483] 609. Sialyltransf erase family, 

Number of members: 18 

[1484] 610. SpoU rRNA Methylase family 

This family of proteins probably use S-AdoMet. Number of members: 58 

[1 ] SpoU protein of Escherichia coli belongs to a new family of putative rRNA methylases. Koonin EV, Rudd KE; Nucleic 
Acids Res 1 993;21 :551 9-551 9. [2] The spoil gene of escherichia coli , the fourth gene of the spoT operon, is essential 
for tRNA (Gm18) 2 * methyltransf erase activity. Persson BC, Jager G, Gustafsson C; Nucleic Acids Res 1997*25' 
4093-4097. 

[1 485] 611. Stathmin family signatures 

Stathmin [1] (from the Greek •stathmos'which means relay), is an ubiquitous intracellular protein, present in a variety 
of phosphorylated forms and which serves as a relay for diverse second messenger pathways. Its expression and 
phosphorylation are regulated throughout development and in response to extracellular signals regulating cell prolif- 
eration, differentiation and function. Stathmin is a highly conserved protein of 149 amino acid residues. Structurally, it 
consists of an N-terminal domain of about 45 residues followed by a 78 residue alpha-helical domain consisting of a 
heptad repeat coiled coil structure and a C-terminal domain of 25 residues. Protein SCG10 is a neuron-specific, mem- 
brane-associated protein that accumulates in the growth cones of developing neurons. It is highly similar in its sequence 
to stathmin, but differs in that it contains an additional N-terminal hydrophobic segment of 32 residues which is probably 
responsible for its interaction with membranes. Xenopus protein XB3 is also evolutionary related to stathmin and also 
contains an additional N-terminal hydrophobic domain [2]. A conserved decapeptide which ends with the first three 
residues of the coiled coil domain and a second pattern that corresponds to part of the central region of the coiled coil 
have been selected as signatures for proteins of the stathmin family. 
Consensus pattern: P-[KRQ]-[KR)(2)-[DE]-x-S-L-[EGJ-E- 
Consensus pattern: A-E-K-R-E-H-E-[KR]-E- 

[1] Sobel A. Trends Biochem. Sci. 16:301 -305(1 991 ).[ 2] Maucuer A., Moreau J., Mechali M., Sobel A. J Biol Chem 
268:16420-16429(1993). 

[1 486] 612. SUA5/yciO/yrdC family signature. The following uncharacterized proteins have been shown [1 ] to share 
regions of similarities: - Yeast protein SUA5. - Escherichia coli hypothetical protein yciO and H1 11 98, the corresponding 
Haemophilus influenzae protein. - Escherichia coli hypothetical protein yrdC and HI0656, the corresponding Haemo- 
philus influenzae protein. - Bacillus subtilis hypothetical protein ywiC. - Mycobacterium leprae hypothetical protein in 
rfe-hemK intergenic region. - Methanococcus jannaschfi hypothetical protein MJ0062. These are proteins of from 20 
to 46 Kd which contain a number of conserved regions in their N-terminal section. They can be picked up in the database 
by the following pattern. 

[1 487] Consensus pattern: [LI VMTA](3)-{LI VMFYCHPG]-T-[DE]4STA]-x-[FY]-[GA]-[U VM]-[GS]- 

[1488] [ 1J Bairoch A., Rudd K.E., Robison K. Unpublished observations (1995). 

[1489] 613. Sucrose synthase 1 
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1998,258:397-403 Brassicaceae. Sakamoto K, Kusaba M. Nishio T; Mol Gen Genet 

[1474] 60a (sdh cyt) Succinate dehydrogenase cytochrome b subunit signatures 

of a cytochrome B and a membrane'anZ Z^T^ZT^ 

protein |1.2,3J belonging to a family that grows- - cJcTh^Th , COmponent - ,s a mono he ™ transmembrane 

b560 from the mamlL rttoj&'gfc^ ^X^SSSi eZST " 

nome of some algae and in the plant Marchsntia M^nl! encoded in the mitochondrial ge- 

(gene SDH3 or C?B3). - P^c" T!^SSSSS^2S^ " mttochoodrial s ™ ""i 

comprise threetransmembraneregions^ere^tolln^ 
S Jwosig r re^ 

Consensus pattern: ^UVMT>*<^vil*x<6^^ h ,. 

\fent Riet 

n^TSS: S^am C i, RiChard ^ G - G ^^-9er J.M.. Uoareg B. J. Mo,. Bio,. 250:484-495(1995). 

Lr^rS *££2,2r** iWO ' Ved h ^ - ^nera, secret. Malawi N, 

Number of members: 40 

[1476] 605. Protein secE/sec61 -gamma signature 
Cr^ 

through the endoplasmic reticulum- ft is ™a 7^te^^2T? ^™ P ^ t( * e m P'^" translocation 
secE and sec61-gamma are Jp^ SS^?* * SeC61 ' a ' Pha and ^ (1J ** 
at their C-terminal extremity <EscheridIJ22?i. ? ** C ° nta ' n 3 sin9 ' e '^membrane region 

60residuesthat ccn^ZZ^^Z^^^ ** " P 0 *** 8 ex,ra ■""»*- ^ent of 
ly we., conserved, however it 18 ^™^ a s^fu^ 

residues before the beginning of ^,ransmemb"L oSn ^ ^ °" 8 prD,ine "«« 10 

hexameric structures [1,21. Each of the subuniis in th* h«v=Z> \ S are "^^y 13 ^ Pfoteins which form 

^^^^^^ ><- 

sulfide bond.-: position of the pattern. Proteins that betono to the 11 <Tiwh cvs,eine Evolved in a <fl- 

cruciferin. rice glutelins, ooBon^^^.cS.^SrSJf pea and broad bean legumins, rape 

anthinin G3. etc. The region thai IctS he SZSS "~* ^ Stobulm - ^nowerhX 

G V) and a proximal cysteine reSdu wS, ^SS^S^SiT^T *" ^ 8nd tesk: subunite 
pattem for this family of proteins. ercna,n a,suff,de ^ "«» been used as a signature 

£ SIT """^ " """ « The 7S «o, S9 . 

Number of members: 67 
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(PP2A) is also an enzyme of broad specificity. PP2A is a trimeric enzyme that consist of a core composed of a catalytic 
subunit associated with a 65 Kd regulatory submit and a third variable subunit. In mammals, there are two closely 
related isoforms of the catalytic subunit of PP2A: PP2A-alpha and PP2A-beta, encoded by separate genes. - Protein 
phosphatases (PP2B or calcineurin), a calcium^ependent enzyme whose activity is stimulated by calmodulin. It is 
composed of two subunits: the catalytic A-subunit and the calcium-binding B-subunit. The specificity of PP2B is re- 
stricted.ln addition to the above-mentioned enzymes, some additional serine/threoninespecific protein phosphatases 
have been characterized and are listed below. - Mammalian phosphatase-X (PP-X), and Drosophila phosphatase-V 
(PP-V) which are closely related but yet distinct from PP2A. - Yeast phosphatase PPH3, which is similar to PP2A, but 
with different enzymatic properties. - Drosophila phosphatase-Y (PP-Y), and yeast phosphatases 21 and 22 (genes 
PP21 and PP22) which are closely related but yet distinct from PP1. - Drosophila retinal degeneration protein C (gene 
rdgC), a calcium-binding phosphatase required to prevent light-induced retinal degeneration. - Phages Lambda and 
Phi-80 ORF-221 which have been shown to have phosphatase activity and are related to mammalian PP's. The best 
conserved regions in these proteins is a perlectly conserved pentapeptide that can be used as a signature pattern 
Consensus pattern: [LIVMJ-R-G-N-H-E- 

[ 1] Cohen P. Annu. Rev. Biochem. 58:453-508(1 989). [ 2] Cohen P., Cohen P.T.W. J. Biol. Chenv 264:21435-21438 
(1989).[ 3) Cohen P.T.W., Brewis N.D., Hughes V., Mann D.J. FEBS Lett. 268:355-359(1990). 
[1470] 600. Translation initiation factor SUI1 signature 

In budding yeast (Saccharomyces cerevisiae), SUI1 is a translation initiation factor that functions in concert with elF- 
2 and the initiator tRNA-Met in directing the ribosome to the proper start site of translation [1]. SUM is a protein of 108 
20 residues. Close homologs of SUM have been found [2] in mammals, insects and plants. SUM is also evolutionary 
related to hypothetical proteins from Escherichia coli (yciH), Haemophilus influenzae (H1 1225) and Methanococcus 
vannielii. A conserved region in the C-terminal section has been selected as a signature pattern 
Consensus pattern: [LIVM)-[EQ]-[LIVM]-Q-G-[DEN]-[KHQ]-[KRV] 

( 1] Yoon H., Donahue T.F. Mot. Cell. BioL 1 2:248-260(1 992).[ 2] Fields C.A., Adams M.D. Biochem. Biophys Res 
2S Commun. 198:288-291(1994). 

[1471] 601 . (S T dehydratase) SerineAhreonine dehydratases pyridoxal-phosphate attachment site 
Serine and threonine dehydratases [1,2] are functionally and structurally related pyridoxal-phosphate dependent en- 
zymes: - L-serine dehydratase (EC 4.2.1.13 ) and D-serine dehydratase (EC 4.2.1.14 ) catalyze the dehydratation of L- 
serine (respectively D-serine) into ammonia and pyruvate. - Threonine dehydratase (EC 4.2.1.16 ) (TDH) catalyzes the 
dehydratation of threonine into alpha-ketobutarate and ammonia. In Escherichia coli and other microorganisms, two 
classes of TDH are known to exist. One is involved in the biosynthesis of isoleucine, the other in hydroxamino acid 
catabolism.Threonine synthase (EC 4.2.99.2 ) is also a pyridoxal-phosphate enzyme, it catalyzes the transformation 
of homosertne-phosphate into threonine. It has been shown [3] that threonine synthase is distantly related to the serine/ 
threonine dehydratases. In all these enzymes, the pyridoxal-phosphate group is attached to a lysine residue. The 
sequence around this residue is sufficiently conserved to allow the derivation of a pattern specific to serine/threonine 
dehydratases and threonine synthases. 

Consensus pattern: [DESH]-x(4,5)-[STVG)-x-[AS]-[FYI]-K-[DLIFSA]-lRVMF]-[GAHLIVMGA] [The K is the pyridoxal-P 
attachment site] 

[ 1] Ogawa K, Gomi T, Konishi K., Date T, Naakashima H., Nose K., Matsuda Y, Peraino C, Pitot H C Fujioka M 
J. BioL Chem. 264:1 5818-1 5823(1 989).[ 2] Datta P., Goss T.J.. Omnaas J.R., Patil R.V. Proc. Natl Acad Sci USA 
84:393-397(1 987).[ 3] Parsot C. EMBO J. 5:301 3-301 9(1 986).[ 4] Grabowski R, Hofmeister A.E.M. , Bucket W Trends 
Biochem. ScL 18:297-300(1993). 

[1472] Cysteine synthase/cystathionine beta-synthase P-phosphate attachment site 

Cysteine synthase (CSase) is the pyridoxal-phosphate dependent enzyme responsible [1] for the formation of cysteine 
from O-acetyl-serine and hydrogen sulfide with the concomitant release of acetic acid. In bacteria suchas Escherichia 
coli, two forms of the enzyme are known (genes cysK and cysM).ln plants there are also two forms, one located in the 
cytoplasm and the otherin chloroplasts.Cystathtanine beta-synthase [2] catalyzes the first irreversiblestep in homo- 
cysteine transforation; the conjugation of homocysteine andserine forming cystathionine. Like Csase it is a pyridoxal- 
phosphate dependent enzyme. The two types of enzymes are evolutionary related. The pyridoxal-phosphategroup of 
CSases has been shown to be attached to a lysine residue which is located in the N-terminal section of these enzymes; 
the sequence around this residue is highly conserved and can be used as a signature pattern to detect this class of 
enzymes. 

Consensus pattern: K-x-E-x(3HPA]-[STAGC]-x-S-[IVAP]-K-x-R-x-[STAG]-x(2)-lLIVM] [The 2nd K is the pyridoxal-P 
attachment site 

[ 1] Saito K., Kurosawa M.. Murakoshi L FEBS Lett. 328:11 1-1 14(1993).[ 2] Swaroop M., Bradley K., Ohura T., Tahara 
T., Roper M.D., Rosenberg LE., Kraus J.P. J. BioL Chem. 267:11455-11461(1992) 
[1473] 602. S locus glycop 

S-locus glycoprotein family. In Brasstcaceae, self-incompatible plants have a self/non-self Comment: recognition sys- 
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catalysts. y ^' enines esj are thought to be involved in substrate-binding and 

KIS^S^ 997 * 45 ^ 121 C ""— * B^e. j 1997 ,, 21 : 125 ,3 2 

™ - -~ — ^ es 

Two dLfem r£St« * ""f 08 ** phyt ° ene s * nthases Wures 
i wo ditterent polyisoprene synthases have been shown f 1 9 an!, ch, 

- Squalene synthase (EC 2.5 1 21) (farnesvl^hoST, / 1 , 3 nUmber ° f re 9 ions of sequence similarities- 
o'^cu.esoffaLsy^ 

pathway. The reaction carried out by SQS fe ratatyze^^i two s^oarata^ 00 ™ 1 !!!''^ S ' 6P ^ & cno ' es f ero. biosynthetic 
of the two molecules ot FPP to form presalSfdbhl^f T f ef>S: ,he ftr * is 3 h " d *>"'w«* condensation 
pendent reduction, to form squalene. eSKES Ik SoSs SZ£T*!L* *" * 3 NADP ^ 

by the FDFT, gene. SQS seems to be membrane™^ ERG9gene. in mammats 

the conversion of two mo.ecu.es of S^^^^^^^^ 0 "•'-> W. which caters 
^"thesis of carotenoids from ^ Phy, ° ene - " is the -top h the 

steps: the firs, is a head-tc-head «£den2^ 

mtermedate is then rearranged to form phytoene PSY IS L a ° diphosphate; this 

and photosymhetic bacteria as we., as sZ ™ ^J^S^^T ** ««*«»<** P'ants 

the gene crtB. ,n plants PSY is .ocataed in ^ThCSl ^ T ^ tacte * PSY * encoded "y 
and PSY share a number of functions, sim, artiS Xh are a.L^ fl ^"/^ d9SCrip,i0n aboVe ' S °S 
particular three well conserved regions are JS^^S^ J? °' ** 6UUCtWe 
° rt "e«t a |y.icmechanism. Signature pattern 

are localized in the centra, part of these enzjmes ^ SeC °" d 30(1 * W conseived re 9™s; they 

Monroy C.A., Bishop R.W. Mo.. Cel.. Bio. « 2706 n T ° ^ ^ YH ' KienZle 8 K - 
Kuntz M. Bochem. Biophys. Res. Con^^SgSSffl ' HU9Ueney ^ F - Camara 8 ' 

SUn? 1 SRP54 " ,ype proleins 01W **» *— • 

protein subunrts. One of these subunfts. t^^t.S^mThTSS J* 000818,8 * 8 ?S RNA ' six 
s.gna. sequence when it emerges from the ribosome ^e N feS™ pro,ei " ** """acts with the 

s.te(G^ain)andareevo.utic^ 

coH and Baci.lus subtilis ffh protein (P48) a -Escherichia 
associated with a 4.5S RNA in the pLJi^^J^S^^^r*^ °' SRP54 - F,h fe 

«ng protein), an integral membrane 'qTP%M^SS^JS^^ ^ ""^ «*»»«*«"» (dock- 

of nascent secretory proteins to the endoptelTc ^ f RR *• "™* **** 

extremrty of the protein. - Bacterial ftsY protein, a protein wh^TbZ^ ? " " l0Ca,ed at CMerminal 
protein in eukaryotes. The G^Jomain is located * the clS. I X ° P ' 3y 3 Similar ro,e to ,hat <* «» *cWno 
serra gonorrhoeae which seems to be^omol of toTTSST* J P, ° ,eia " ^ P " A pr ° tein from Nefe 
This protein is also believed to be a docking tSl^L gTZ , * archa *«^ SuKotobus solfataricus. 
biosynthesfe proteh iw- The best conserved JlSfSl" ^ ** °" termin " S - " 

GTP-binding site, bu, as those regions are no, JSc JZ^Zi^ ^ *■ are P 3 " - 

Instead, a consented region located a, the C-terrnTa. end of m« SSL™' * ? fC USed 38 3 8i B nato ~ P a «™- 
Consensus pattern P-f LIVMl-x fFYi ui iwim!I« ? * Xn3ln was selected. 

Nuceic Acids Res £ 93 Tl J^SJ ,V,M1 W 3S ^^ 1 1J Afthoff S., SeHnger D.. Wise J A 

Phosphate group attached to a sLe or evoSiS^S P ot^l ° tT" ^ the removal of a 

specificity. ., is inhibited by two thermostable pZX'l 2^' <PP1) ' S " ° f broad 

^=G)^^^ 

1 ( 9 ene ,T 4) is ,vo,ed , dephospho^ ^13^ ^KMSS 
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gamma-1 and -2. - Mammalian phosphatidyl inositol 3-kinase regulatory p85 subunit. - Mammalian Ras GTPase- 
activating protein (GAP). - Adaptor proteins mediating binding of guanine nucleotide exchange factors to growth factor 
receptors: vertebrate GRB2, Caenorhabditis elegans sem-5 and Drosophila DRK. All of which have two SH3 domains. 

- Mammalian Vav oncoprotein, a guanine nucleotide exchange factor of the CDC24 family. - Some guanine^nucleotide 
releasing factors of the CDC25 family: yeast CDC25, yeast SCD25, fission yeast ste6. - MAG UK proteins. These 
proteins consist of at least three types of domains: one or more copies of the DHR domain, a SH3 domain and a C- 
terminal guanylate kinase domain. Members of this family are: Drosophila lethal(1)discs large-1 tumor suppressor 
protein (gene Dlg1 ). mammalian tight junction protein ZO-1 , vertebrate erythrocyte membrane protein p55, Caenorhab- 
ditis elegans protein lin-2, rat protein CASK and mammalian synaptic proteins SAP90YPSD-95, CHAPSYN-110/PSD- 
93, SAP97/DLG1 and SAP102. - Misceilanous proteins interacting with vertebrate receptor protein tyrosine kinases: 
mammalian cytoplasmic protein Nek (3 copies), oncoprotein Crk (2 copies). - Chicken Src substrate p80/85 protein 
(cortactin) and the similar human hemopoietic lineage cell specific protein Hs1. - Mammalian dihydrouridine-sensitrve 
L-rype calcium channel beta (regulatory) subunit including the related human myasthenic syndrome antigen B (MSYB). 

- Mammalian neutrophil cytosolic activators of NADPH oxidase: p47 (NCF-1), p67 (NCF-2), and a potential homolog 
from Caenorhabditis elegans (B0303.7). NCF-1 and -2 have two copies of the SH3 domain, while B0303.7 has four. - 
Some myosin heavy chains from amoebae, slime molds and yeast (gene MY03). - Vertebrate and Drosophila spectrin 
and fodrin alpha-chain. - Human amphiphysin. - Yeast actin-binding protein ABPt. - Yeast actin-binding protein SLA1 
(3 copies). - Yeast protein BEM1 and the fission yeast homolog scd2 (or ra!3) (2 copies). - Yeast BEM1 -binding proteins 
BOI2 (BEB1) and BOB1 (BOI1 ). - Yeast fusion protein FUS1. - Yeast protein RSV167. - Yeast protein SSU81 - Yeast 
hypothetical proteins YAR014C (1 copy), YFR024c (1 copy), YHL002w (1 copy), YHR016c (1 copy), YJL020C (1 copy), 
YHR114w (2 copies) and the fission yeast homolog SpAC12C2.05c. - Caenorhabditis elegans hypothetical proteins 
F42H10.3. The profile developed to detect SH3 domains is based on a structural alignment consisting of 5 gap-free 
blocks and 4 linker regions totaling 62 match positions. 

[ 1] Mayer B.J., Hamaguchi M., Hanafusa H. Nature 332:272-275(1 988).[ 2] Musacchio A., Gibson T., Lehto V.R, Sar- 
aste M. FEBS Lett. 307:55-61(1 992).[ 3] Pawson T., Schlessinger J. Curr. Biol. 3:434-442(1 993) [ 4] Mayer B J ' Bal- 
timore D. Trends Cell Biol. 3:8-1 3(1 993).[ 5] Pawson T. Nature 373:573-580(1 995). [ 6] Kuriyan J , Cowbum D Curr 
Opm. Struct. Biol. 3:828-837(1 993). [ 7] Morton C.J., Campbell I.D. Curr. Biol. 4:615-617(1994). 
[1457] 590. Serine hydroxymethyltransferase pyridoxal-phosphate attachment site (SHMT) 
Serine hydroxymethyltransferase (EC 2.1.2.1 ) (SHMT) [1] catalyzes the transfer of the hydroxymethyl group of serine 
to tetrahydrofolate to form 5,10-methylenetetrahydrofotate and glycine. In vertebrates, it exists in acytoplasmic and a 
mitochondrial form whereas only one form is found in prokaryotes. Serine hydroxymethyltransferase is a pyridoxal- 
phosphate containing enzyme. The pyridoxal-P group is attached to a lysine residue around which the sequence is 
highly conserved in all forms of the enzyme. 

Consensus pattern: [DEHJ-[LIVMFY]-x-[STMV]-[GST]-[STI(2)-H-K-[STl-[LFhx-G-[PACHRQHGSA]-[GA] [K is the py- 
ridoxal-P attachment site) 

[ 1) Usha R., SavKhri H.S., Rao N.A. Biochim. Biophys. Acta 1204:75-83(1994) 
[1458] 591. SIS domain 

SIS (Sugar ISomerase) domains are found in many phosphosugar tsomerases and phosphosugar binding proteins 
[1] Teplyakov A, Obmolova G, Badet-Denisot MA, Badet B, Polikarpov I; Structure 1998 6- 1047-1 055 
[1459] 592. (SKI) Shikimate kinase signature 

Shikimate kinase (EC 2.7,1.71) catalyzes the fifth step in the biosynthesis from chorismate of the aromatic amino acids 

(the shikimate pathway) inbacteria (gene aroK or aroL), plants and in fungi (where it is part of a multifunctional enzyme 

which catalyzes five consecutive steps in this pathway). Shikimate kinase is a small protein of about 200 residues. A 

conserved region that contains a run of three glycines has been selected as a signature pattern 

Consensus pattern: [KR]-x(2)-E-x(3)-[LIVMF]-x(8,12)-[LIVMF](2)-{SA]-x^G(3)- x-[LIVMF]. Proteins belonging to this 

family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop). 

[1460] 593. SNAP-25 family 

[1461] SNAP-25 (synaptosome-associated protein 25 kDa) proteins are components of SNARE complexes. Mem- 
bers of this family contain a cluster of cysteine residues that can be palmitoylated for membrane attachment (2). 
[1462] [1]Brennwald P, Keams B, Champion K, Keranen S, Bankartis V, Novick P; Cell 1 994;79:245-258. [2) Risinger 
C, Blomqvist AG. Lundell I, Lambertsson A, Nassel D, Pieribone VA, Brodin L, Larhammar D* J Biol Chem 1993 268 
24408-24414. 

[1 463] 594. SNF2 and others N-terminal domain 

[1464] This domain is found in proteins involved in a variety of processes including transcription regulation (e g 
SNF2, STH1, brahma, MOT1), DNA repair (e.g., ERCC6. RAD 16, RAD5), DNA recombination (e.g., RAD 54), and 
ETLn^" 1 UnWindin9 (e ' 9 *' ISWI) as Wefl as a variety of other P roteins with little Junctional information (e.g., lodestar. 
[1465] 595. Staphylococcal nuclease homologues (Snase) 
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the second is an almost perfectly conserved glycine-rich nonapeptide 
Consensus pattern: G-A-G-D-Q-G-xra Pi nrvVn_c„„ ■ 

Consensus pattern: GiGA^^J^^^ Kno *" n ,0 beton 9 to ,his "ass detected by the pattern: 
liiwj 586. SAICAR synthetase signatures 

PhosphoriDosylaminoimidazole-succinocarboxamide synthase /ecru w (<!4lri . D u 
enth step in thede novo purine bic^nmeticpathZ ,h^^^ 

idazole-4-carboxylic acid and asparlic acid to S^CAR flT^ h~? , <* S^hosphoribosyl-S-aminoim- 
SAICAR synthetase * a rnono,unc,bna? P « l lS^^S C ^X ^ ** Pbn,S ' 

zyme that also catalyze phosph 0 ribosylLnoi mida zS "£££l AJRCl iS? ™ ° f 8 «* 
central section of this enzyme have been selected as J«*E£™^ , l, V Tw ° consefved in the 

Consensus pattern: (UVM^-P-fL^ s Vn,he,ase. 
Consensus pattern: [LIVMHLIVMAJ-D-x-K-[UVMFY>E-F-G X(JHTA] ^ S " 

liS 587' SSS P T ^ *" ^ ^ Bi °'- ^59-287(1992). 
114S4J 587. (SCP) Extracellular proteins SCP/Tpx-l/AoS/PR-i/Sc? «iL=w„ 

a^r-^Sfe ■ 

(GliPR). - uzard helothermine. a toxin mat ^^^1*^7 ™ ^ P** 0 **™*-"**** protein 
and venom allergen 3 (Ag3) trom fire bS^^^^^Z^T * ^ ^ 

reactions to stings from insects of the hymenopterL famT m J2?l T "* ™ in 08086 of 
four disulfide bonds. - Rant pathogenesTprE* Zpr 'teSrnT °' 8b0Ut 200 residues and 
openhfect^orotherst^^^ 

three disulfide bonds. - Proteins Sc7 and Sen from tL Loo ^ . , 130 to 140 residues and probably contain 

cenularproteinsare^sely^^ 

P'°^y contain twodisulfidebon^^^ 

YJL078c.YJL079candYKR013w.Theexa^ioTtSs^ 

in their Ctermina. ha« have been selected a S5£ £££ The sS JS? TW> "T- ^ons located 
known to be involved in a disulfide bond in Ag5 signature contains a cysteine which is 

Consensus pattern: [GDERhH^FYWHJ-T-OCLIVMJf^-W-^HSTN] 

sulfide bond] 

: b jnvo|ved 

150:2823-2830(1993).[ 4J Dixon D C. Cult J R ' Ktessi^ TdFEM^TT^ JS*"" ° R ' Ki " 9 TR J 
geirsdottir SA, Kothe E.M., Scheer J.M J Wessete J G H J G^m 7 ; 1324 ( 1991 >l *J Schuren F.H.J.. As- 

[1455J 588. SET domain ° ea M,crobK > i - 139:2083-2090(1993). 

snr^r^zr "s^r ~ • - — ~ «* *■ 

P^haa^cwnranc (osPTPas.s) ^ * «»«»™»* <*P<ay similarity with aoKpedBdiy. 

[1456] 589. Src homology 3 (SH3) domain profile 

The Src homology 3 (SH3) domain is a small protein domain of about fin amir _ ^ ■ 
conserved sequence in the non-catalytic part o severaLcSLt ™, amino-** res.dues first identified as a 
Since then, it has been found in a great varfety ol Z I T""* ***** (e 9 Src " ^ > '" Lck) ( 1J. 
SH3 domain has a characteristic foK which Z^S^^ZS^S^ ^ ****** 
parallel beta sheets. The linker regions may contain s^leL^S^Tl "Tf * S ^ **** P 30 ^ «*• 
derstood. The current opinion is that they mediate as^bhTof^l? , °" °' 1,16 SH3 domain b ** w °» «*- 
peptides f7].,n genera. SH3 domains areLS a^ J^^T C T P ' eXeS ^ * P '°^ 

protein with two SH3 domains and a few with 3 or 4 ccp^eTso fa? sm h * fe 3 s,9nifican, """»ber of 

proteins: - Many vertebrate, invertebrate and retrovirSSla^J : (SSJSH? ? ^ ^"'^ h th ° ,0,,0Win 9 
-eSrc.Abl.Bkt.CskandZAP.fam.liesofk^^K 
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Consensus pattern: [RK]-G-{EDRKHPCGMAGSCIHFYJ-[LIVA]-x-[FYLM] In most cases the residue in position 3 of 
the pattern is either Tyr or Phe. 

[ 1) Bandziulis R.J., Swanson M.S., Dreyfuss G. Genes Dev. 3:431 -437(1 989). [ 2] Dreyfuss G., Swanson M S Pinol- 
Roma S. Trends Bbchem. Sci. 1 3:86-91 (1988).[ 3] Milburn S.C., Hershey J.W.B., Davies M.V.', Kelleher K Kaufman 
R.J. EMBO J. 9:2783-2790(1990).[ 4] Szabo A.. Dalmau J.. Manley G., Rosenfeld M., Wong E Henson J Posner J 
B., Furneaux HM Cell 67:325-333(1991)1 5] Rebagliati M. Cell 58:231-232(1989) [6] Li Y., Sugiura M. EMBO J. 9: 
3059-3066(1 990).[ 7] Ayane M., Preuss U.. Koehler G., Nielsen PJ. Nucleic Acids Res. 19:1 273-1278(1991)4 8] 
Kawakami A., TianQ., Duan X. t Streuli M., Schlossman S.F., Anderson P. Proc. Natl. Acad. Sci. U.S.A. 89:8681-8685 
(1992).( 9] Minvielle-Sebastia L, Winsor B., Bonneaud N., Lacroute F. Mol. Cell. Biol. 11:3075-3087(1991). 
[1443] 581. Rubredoxin signature 

[1444] Rubredoxins [1] are small electron-transfer prokaryotic proteins. They contain an iron atom which is ligated 
by four cysteine residues. Rubredoxins are, in some cases, functionally interchangeable with ferredoxins. 
[1445] A conserved region that includes two of the cysteine residues that bind the iron atom has been selected as 
a pattern for these proteins. 

[1446] Consensus pattern: [LIVM]-x(3)-W-x-C-P-x-C-[AGD] [The two C's bind the iron atom] 

In Pseudomonas oleovorans rubredoxin 2 (gene alkG) [2], this pattern is found twice because alkG has two rubredoxin 

domains. 

Rubrerythrin (3J, a protein with inorganic pyrophosphatase activity from Desulfovibrio vulgaris possesses a C-terminal 

rubredoxin-like domain, but this domain is too divergent to be detected by the above pattern. 

[1447] ( 1] Berg J.M., Holm RH.(ln) Iron-sulfur proteins, Spiro TG., Ed., pp1-66, Wiley, New- York, (1982). [ 2] Kok 

M., Oldenhuis R., der Linden M.P.G., Meulenberg C.H.C., Kingma J., Witholt B., J. Biol. Chem. 264:5442-5451(1989). 

[3] van Beeumen J.J., van Driessche G., Liu M.-Y., Le Gall J., J. Biol. Chem. 266:20645-20653(1991). 

[1448] 582. (rvp) Eukaryotic and viral aspartyl proteases active site 

Aspartyl proteases, also known as acid proteases, (EC 3.4.23.-) are a widely distributed family of proteolytic enzymes 
[1 ,2,3] known to exist invertebrates, fungi, plants, retroviruses and some plant viruses. Aspartate proteases of eukary- 
otes are monomeric enzymes which consist of two domains. Each domain contains an active site centered on a catalytic 
aspartyl residue. The two domains most probably evolved from the duplication of an ancestral gene encoding a pri- 
mordial domain. Currently known eukaryotic aspartyl proteases are: - Vertebrate gastric pepsins A and C (also known 
as gastricsin). - Vertebrate chymosin (rennin), involved in digestion and used for making cheese. - Vertebrate lysosomal 
cathepsins D (EC 3.4.23.5) and E (EC 3.4.23.34 ). - Mammalian renin (EC 3.4.23.15) whose function is to generate 
angiotensin I from angiotensinogen in the plasma. - Fungal proteases such as aspergillopepsin A (EC 3.4.23.18 ), 
candidapepsin (EC 3.4.23.24), mucoropepsin (EC 3.4.23.23 ) (mucor rennin), endothtapepsin (EC 3.4.23.22 ). polypi 
ropepsin (EC 3.4.23.29), and rhizopuspepsin (EC 3.4.23.21) . - Yeast saccharopepsin (EC 3.4.23.25 ) (proteinase A) 
(gene PEP4). PEP4 is implicated in posttranstational regulation of vacuolar hydrolases. - Yeast barrier pepsin (EC 
3.4.23.35) (gene BAR1 ); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating pheromone. 
- Fission yeast sxal which is involved in degrading or processing the mating pheromones. Most retroviruses and some 
plant viruses, such as badnaviruses, encode for anaspartyl protease which is an homodimer of a chain of about 95 to 
125 amino acids. In most retroviruses, the protease is encoded as a segment of a potyprotein which is cleaved during 
the maturation process of the virus. It is generally part of the pol polyprotein and, more rarely, of the gagpolyprotein. 
Conservation of the sequence around the two aspartates of eukaryotic aspartyl proteases and around the single active 
site of the viral proteases allows us to develop a single signature pattern for both groups of protease. 
Consensus pattern: [LIVMFGAC]-[LIVMTADN]-[LIVFSA]-D-[ST]-G-[STAV]-[STAPDENQ]- x-[LIVMFSTNC]-x-[UVMF- 
GTA] [D is the active site residue] - [ 1] Foltmann B. Essays Biochem. 17:52-84(1981).[2] Davies D.R. Annu. Rev. 
Biophys. Chem. 19:1 89-21 5(1 990).[ 3] Rao J.K.M., Erickson J.W., Wlodawer A. Biochemistry 30:4663-4671 (1991).[ 4] 
Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:105-120(1995). 
[1449] 583. (rvt) Reverse transcriptase (RNA-dependent DNA polymerase) 

A reverse transcriptase gene is usually indicative of a mobile element such as a retrotransposon or retrovirus. Reverse 
transcriptases occur in a variety of mobile elements, including retrotransposons. retroviruses, group II introns, bacterial 
msDNAs, hepadnaviruses, and caulrmoviruses. Number of members: 1233 

[1 450] [1 ] Medline: 9 1 006031 . Origin and evolution of retroelements based upon their reverse transcriptase sequenc- 
es. Xiong Y, Eickbush TH; EMBO J 1990;9:3353-3362. 
[1451] 584. (S-AdoMet synt) S-adenosylmethionine synthetase signatures 

S-adenosylmethionine synthetase (EC 2.5.1,6 ) is the enzyme that catalyzes theformation of S-adenosylmethionine 
(AdoMet) from methionine and ATP [1]. AdoMet is an important methyl donor for transmethytation and is also the 
propylamine donor in polyamine biosynthesis. In bacteria there is a single isoform of AdoMet synthetase (gene metK), 
there are two in budding yeast (genes SAM1 and SAM2) and in mammals while in plants there is generally a multigene 
famiry.The sequence of AdoMet synthetase is highly conserved throughout isozymes and species. Two signature pat- 
terns have been selected for this type of enzyme; the first is a hexapeptide which seems to be involved in ATP-binding; 
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c,ave the „ ^ .inkage of RNA vfe a 
lutionary related to these fungal eTz2^^ n ^£r^T^ " ****** b ° en ,0Und to be ev °- 
9 ene (S-gene) that has several a.leles ™ is ^ZiSS^l^^ ? f C ° n,r ° l,ed by 3 Si " 9le 
two S- alleles expressed in the style The seSola.ibl m It^T or by pollen bearing either of the 

family has been shown [2,3] to be a toonucle^^ 

14]. These two enzymes are 'probably S^p^SlSS!^ LE and U ,r0m t0ma, ° 

RNAse I (EC 3.1.27.6) (gene rna) [I] Te^TSn^T T reSC " e s * s,em " Escherichia coli periptasmic 

RNase T2 and Rh. These hJZZZE^JZZ 2^,? *T h ** ™<*^of 

above. Two signature patterns have beenlvelSTd Z f or C ° nSe ' Ved h a " ,he Sequence described 

a,so contains a cyste™ which is known ,o be ^-"-P-Urn, 
Consensus pattern: [FYWL]-x-[LIVMJ-H-G-L-W-P [H is an active site residue] 

a^dTb^ 

BA. Haring V., Ebert P.*' Anderson M A. Sim^sS ^ Sa^ F cISaT Z!*"^" 0 " 31 McC "" e 
effler A., Glund K., Irie M. Eur. J. Biochem. 214^^,S^|^?m I Na, 7j* 2;959 57<1989).[ 4] Lo- 
Kawata Y, Sakiyama F Hayashi F Kvoooku V Fm Tr k ,i, " '"" Kenne " D Gene 95:1-7(1990).] 61 

K., ,ie M., M Jo K. NaSufa xV^i^^s^^ 1 " 0 " 71 IQ — *. M ^ «J 

necessary for DNA synthesis. l£nu3SS?5^^2Sf ,fl nb0nUC ' e0,k,es 11 provkJes *• P^rsors 

1000re S idues)andasn^lsubun M 30^ 

chain from promotes, eukaryotes and viruses^r^^ 

Science 260:1773-1777(19937 ^ ^^'^M 2] Reichard P. 

[1440] 579. RNase H 

-unTas^ SST. ^ in retrovira, rep.^on cyc te , and often 

11441, SeC.Eukaryoticputa,^ 

Heterogeneous nuCear ribonucleop^ef^ hnRNP £ aJSSZSJ" T h ^ ,Mn ° pro,eins: ~ 

- hnRNP C (C1/C2) (once). - hnRNP E (UP2) ^o^hSS?^^ 0 ^ " 

** - U1 snRNP 70 Kd (once). - U1 snRNP A (once U2^Sjp 5? ? ^ Sma " nUC ' ear ^oncoproteins 

"-™e*s y n,hesi S La«L^ 

■ Nuc.eo.in (4 times). - Yeas, single-stranded nuSac^ ^ 
(twice . NSR1 is invotved in pre-rRNAorocessino- it trw^iK, k ^ ? . B1) (once) - ' Yeast P ro,ein NSR1 
protein (PABP) (4 times) *• ahers " ^Tht * nUCleaf ^'^on sequences. - Poly(A) binding 

determination pL^Lto^r 2 ^^r^ZS^r^ (SX,) *«* " « 

the RNA metabo.ism of neuronT- Hu ^ i^bsSeLf T" <3 " hfch " pr0bab * * 
simi b r,oefcvandwh,hn.y P ,ayar^ 

segment-polarity homeobox protein that may also bind to speicific mRNAs J^** *"* P^tein (once, £]. a 
play a role in the transcription of RNA polymerase IN The m 2f c ? ^ ", (0nC6) - 3 pro,ein *** ma V 

- A maize protein induced by abseil ac^^ponse 'to Stefs^s 3 RNP ""^ pro,e * 

Three tobacco proteins, located in thechloZasTf^rh Z k !' ^ 88ems ,0 be a R NA*h*»g protein. - 
RNAs (twfce). - X16 [7 . a «!TpSX^ 

.iteration and/or maturation. - Insulin-inLed groS Zo^SS C.T,? ff" 9 '? W " h Ce " U,ar pro " 

T.AR (3 times) ]8, which possesses nuc.eo.ytic a^SCoxfcCh^ " ™ "* 

apoptosis. - Yeast RNA1 5 protein, which Dbvs a rote J^JTlj^ terQet Cells ' "** be ■""v 0 ^ h 



EP 1 033 405 A2 

loop. Examples of such proteins are the E1 -E2 ATPases or the glycolytic kinases. In other ATP- or GTP-binding proteins 
the flexible loop exists in a slightly different form; this is the case for tubulins or protein kinases. A special mention must 
be reserved foradenylate kinase, in which there is a single deviation from the P-toop pattern: in the last position Gly is 
found instead of Ser or Thr. 
Consensus pattern: [AG]-x(4)-G-K-[ST] 

In addition to the proteins listed above, the 'A* motif is also found in a number of other proteins. Most of these proteins 
probably bind a nucleotide, but others are definitively not ATP- or GTP-binding (as for example chymotrypsin, or human 
ferritin light chain). 

[ 1 J Walker J.E., Saraste M., Runswick M.J., Gay N.J. EMBO J. 1:945-951(1 982). [ 2) Moller W. ? Amons R. FEBS Lett. 
186:1 -7(1 985).[ 3] Fry DC, Kuby S.A., Mildvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-91 1(1 986). ( 4) Dever T.E., 
Glynias M.J., Merrick W.C. Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987).[ 5] Saraste M„ Sibbald P.R.. Wittinghofer 
A. Trends Biochem. Sci. 1 5:430-434(1 990).[ 6] Koonin E.V. J. Mol. Biol. 229:1 165-11 74(1 993).[ 7] Higgins C.F., Hyde 
S.C., Mimmack M.M., Gileadi U, Gill D.R., Gallagher M.P. J. Bioenerg. Biomembr 22:571-592(1 990). [8] Hodgman T 
C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata).f 9] Under P., Lasko P., Ashbumer M., Leroy P., 
Nielsen P.J., Nishi K., Schnier J., Slonimski RP. Nature 337: 121-1 22(1 989).[10] Gorbalenya A.E., Koonin E.V., 
Donchenko A.P., Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 
[1431] GTP-binding nuclear protein ran signature (ras) 

Ran (or TC4) is a small abundant nuclear protein that binds and hydrolyzes GTP and which has been implicated in a 
large number of processes including nucleocytoplasmic transport. RN A synthesis, processing and export and cell cycle 
checkpoint control [1,2]. Ran is generally included in the RAS 'superfamir/ of small GTP-binding proteins [3], but it is 
only slightly related to the other RAS proteins. It also differs from RAS proteins in that it lacks cysteine residues at its 
C- terminal and is therefore not subject to prenylation. Instead ran has an acidic C-terminus. It is, however similar to 
RAS family members in requiring a specific guanine nucleotide exchange factor (GEF) and a specific GTPase activating 
protein (GAP) as stimulators of overall GTPase activity. The region of the GTP-binding B motif which, in ran, is perfectly 
conserved has been selected as a signature pattern. Consensus pattern: D-T-A-G-Q-E-K-[LF]-G-G-L-R-[DE]-G-Y-Y- 
Proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop). 
[ 1] Scheffzek K.. Klebe C. Fritz-Wolf K., Kabsch W., Wittinghofer A. Nature 374:378-381(1 995). [ 2] Rush M.G., Drivas 
G., d'Eustachio R BioEssays 18: 103-11 2(1 996).[ 3] Valencia A., Chardin P., Wittinghofer A., Sander C Biochemistry 
30:4637-4648(1991). 
[1432] 574. rec A signature 

The bacterial recA protein [1,2,3£1] is essential for homologous recombination and recombinational repair of DNA 
damage. RecA has many activities: it filaments, rt binds to single- and double-stranded DNA, rtbinds and hydrolyzes 
ATP, it is also a recombinase and, finally, it interacts with lexA causing its activation and leading to its autocatarytic 
cleavage. RecA is a protein of about 350 amino-acid residues. Its sequence is very well conserved [SAS^EI] among 
eubacterial species. It is also found in the chloroplast of plants [6]. The best conserved region, a nonapeptide located 
in the middle of the sequence which is part of the monomer-monomer interface in a recA filament has been selected 
as a signature pattern,. 

Consensus pattern: A-L-[KR]-[IF]-[FY]-[STA]-{STAD]-[LIVMQ]-R- 

[ 1] Smith K.C., Wang T.-C. V. BioEssays 10:12-16(1989).[ 2] Lloyd A.T., Sharp P.M. J. Mol. Evol. 37:399-407(1993). 
[ 3] Roca A.I., Cox M.M. Prog. Nucleic Acids Res. Mol. Biol. 56: 129-223(1 997). [ 4] Karlin S., Weinstock G.M., Brendel 
V. J. Bacteriol. 1 77:6881 -6893(1 995). [ 5] Eisen J.A. J. Mol. Evol. 41:1105-11 23(1 995).[ 6] Cerutti H.D., Osman M., 
Grandoni P.. Jagendorf AT. Proc. Natl. Acad. Sci. U.S.A. 89:8068-8072(1 992).[E1 ] http://www.tigr.org/-ieiserVRecA/ 
RecA.html 

[1433] 575. Response regulator receiver domain 

This domain receives the signal from the sensor partner inComment: bacterial two-component systems. It is usually 

found N-terminalComment: to a DNA binding effector domain. 

[1] Pao GM, Saier MH; J Mol Evol 1995;40:136-154. 

[1434] 576. Ribonucleotide reductase large subunit signature 

•Ribonucleotide reductase (EC 1.17.4.1 ) [1,2] catalyzes the reductive synthesis of deoxyribonucleotides from their 
corresponding ribonucleotides. It provides the precursors necessary for DNA synthesis. Ribonucleotide reductase is 
an oligomeric enzyme composed of a large subunit (700 to 1000 residues) and a small subunit (300 to 400 residues). 
There are regions of similarities in the sequence of the large chain from prokaryotes, eukaryotes and viruses. One of 
these regions has been selected as a signature pattern. 

[1 435] Consensus pattern: W-x(2)-(LF]-x(6,7)-G-{LI VM]-(FYRA)-{NH]-x(3)-f STAQLI VM]-[ASC]-x(2)-[PA]- 

[ 1) Nillson O., Lundqvist T. t Hahne S., Sjoberg B.-M. Biochem. Soc. Trans. 16:91 -94(1 988).[ 2] Reichard P. Science 

260:1773-1777(1993). 

[1436] 577. Ribonuclease T2 family histkJine active sites 

The fungal ribonucleases T2 from Aspergillus oryzae, M from Aspergillus saitoiand Rh from Rhizopeus niveus are 
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■ Campylobacter jejuni cell binding factor 2 (CBF2), a secreted antigen 

- Bactllus subtilis hypothetical protein yacD. 9 

- Helicobacter py tori hypothetical protein HP01 75. 

- A hypothetical slime mold protein. 

LTJ^^:TZT ,ne M ^ 3 r ° ,e " ^ - — enzy.es has 

- Consensus pattern: NGSADEIJ-x W A«(3>^^^ 

[ 1] Fischer G., Schmid FX 
Biochemistry 29:2205-2212(1990). 

[ 2] Rudd K.E.. Sofia H.J.. Koonin E.V.. Plunkett G. Ill Lazar S 
Rouwere RE. Trends Biochem. Sci. 20:14-15(1995) 
[ 3] Rahfeld J.-U.. Ruecknagel K.P.. Schelbert B.. Ludwig B.. Hacker J 
Mann K., Fischer G. FEBS Lett. 352:180-184(1994). 

[1428] 571 (RmaAD) Ribosomal RNA adenine dimethylases signature 

-n^^ — - — BNAs (EC 2.1,48) have been 

*sgA .eads ,o resist to the e^J^^ZS^Z^ " ^ °' 

" ^tSSZET&r D,M1> - " ,0 ^ *- twin ade- 

" S££%Z2^^ to -^.ncosarnide-streptograrnin B (MLS) 

"a reduced affinity bLeen'ri^^^ 

- Caenorhabditis elegans hypothetical protein E02H1 .1 . 

p^tsr^ 

[2, Lafontaine D., Delcour J., Gfcsser A.L., Desgres J., Vandenhaute J. J. Mo .. Biol. 241 492-497(1 994, 

conserved of these motifs is a glycine-rich reqS w^fch^tLlT C °" SerVed sec " Jence The best 

a.pha-he.ix. This loop interacts with ^2^22.^^? rT 3 eX ' We ^ betWBen a "««««and and an 

referred to as the '/Consensus ^^m^XSSS^ ^ ^ mo " * 9eneraW " 

which the P-toop fe found. A number of pro eh temiL^r ScMh! ? """"T W GTp - bin ^9 P«**» in 

been noted are listed below: - ATP synthase atohTatd I bltS > 1?*™ ° f ' he pf8Sence °« such a ™« "as 

and kinesin-Hke proteins. - Dynamins ^a^e nrSinf r " " ^ ^ ^ ' Kinesin heav V 

kinase. - Shikimate Knase.^rogena* >^ ' Th f ^ '*~ <" 

port' (ABC transporters) 17] - DNA and RNA helicases 18 S MO S> 7 ? 9 Pf0te,nS mVOlVed h ' active tra " s " 

EF-G, EF-2, etc.). - Ras family of OTP-b^g P S ^ EF " 1 

- ADP-ribosylatton factors family - bJ^ZZ^ ^'^T' ' ^ " NUdeaf proteh ran " 
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[ 1) Engemann S., Herfurth E., Briesemeister U., Wittmann-Liebold B. 

J. Protein Chem. 14:189-195(1995). 

[1423] 567. Ribosomal protein S9 signature 

Ribosomal protein S9 is one of the proteins from the small ribosomal subunit. It belongs to a family of ribosomal proteins 
which, on the basis of sequence similarities [1 ,2], groups: - Eubacterial S9. - Algal chloroplast S9. 

Cyanelle S9. - Archaebacterial S9. - Mammalian S16. - Plant S16. 
Yeast mitochondrial ribosomal S9. 

A conserved region containing many charged residues and located in the central section of these proteins has been 
selected as a signature pattern. 

- Consensus pattern: G-G-G-x(2HGSA]0-x(2)-[SA]-x(3)-[GSA]-x-[GSTAV]-[KRHGSAL]-[UF] 

[ 1J Chan Y.-L, Paz V., Oh/era J., Wool I.G. FEBS Lett. 263:85-88(1990). 
[ 2) Otaka E. t Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1424] 568. Ribulose-phosphate 3-epimerase family signatures 

Ribulose-phosphate 3-epimerase (EC 5.1.3.1) (also known as pentose-5-phosphate 3-epimerase or PPE) is the en- 
zyme that converts D-ribulose 5-phosphate into D-xylulose 5-phosphate in Calvin's reductive pentose phosphate cycle. 
In Alcaligenes eutrophus two copies of the gene coding for PPE are known [1 ], one is chromosomally encoded (cbbEC), 
the other one is on a plasmid (cbbeP). PPE has been found in a wide range of bacteria, archebacteria, fungi and plants! 
The sequence of PPE is highly related to: 

Escherichia coli D-allulose-6-phosphate 3-epimerase (gene alsE). 

Escherichia coli protein sgcE. 

Mycoplasma genitalium hypothetical protein MG112. 

All these proteins have from 209 to 241 amino acid residues. 

Two conserved regions which are located respectively in the N-terminal and in the central part of these proteins have 
been selected as signature patterns. 

- Consensus pattern: [LI VMF]-H-[LIVMFY]-D-[LIVM]-x-D-x(1 .2)-[FY]-[LIVM]-x-N-x-[STAV] 

- Consensus pattern: [LIVMA]-x-[LIVMJ-M-[ST]-[VSJ-x-P-x(3)-G-Q-x*F-x(6)-[NKHLIVMC] 

[ 1] Kusian B. t Yoo J.G., Bednarski R., Bowien B. 
J. Bacteriol. 174:7337-7344(1992). 

[1425] 569. (Ricin B lectin) Similarity to lectin domain of ricin beta-chain, 3 copies. 
[1426] This family consists of a triplicated domain involved in cell agglutination in ricin. 
[1427] 570. (Rotamase) PpiC-type peptidyl-prolyl cis-trans isomerase signature 

Peptidyl-protyl cis-trans isomerase (EC 5.2.1.8) (PPIase or rotamase) is an enzyme that accelerates protein folding 
by catalyzing the cis-trans isomerization of proline imidic peptide bonds in oligopeptides [1 J. Most characterized PPiases 
belong to two families, the cyclophilin-type (see <PDOC00 1 54>) and the the FKBP-type (see <PDOC00426>). Recently 
a third family has been discovered [2,3]. So far, the only biochemically characterized member of this family is the 
Escherichia coli protein parvulin (gene ppiC), a small (92 residues) cytoplasmic enzyme that prefers amino acid resi- 
dues with hydrophobic side chains like leucine and phenylalanine in the P1 position of the peptides substrates. PpiC 
is evolutionary related to a number of proteins that are also probably PPiases: 

- Escherichia coli and Haemophilus influenzae ppiD. PpiD is a PPIase which contains a periplasms ppiC-like domain 
anchored to the inner membrane and which seems to be involved in the folding of outer membrane proteins. 

- Escherichia coli surA. SurA is a periptasmic protein that contains two ppiC-like domains. 

- Nitrogen-assimilating bacteria protein nifM which is involved in the activation and stabilization of the iron^ompo- 
nent (nrfH) of nitrogenase. 

Bacillus subtilis protein prsA, a membrane-bound lipoprotein involved in protein export. 

- Lactococcus and lactobacillus protease maturation protein prtM, a membrane-bound lipoprotein involved in the 
maturation of a secreted serine proteinase. - Yeast protein ESS1/PTF1 (processingAermination factor 1 ). 

- Drosophila protein dodo (gene dod). - Mammalian protein PIN1, 
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- Algal and plan, chloroplast S7. - Cyanelle S7. - Archaebacterial S7 

- Plant mitochondrial S7. - Mammalian S5. - Plant S5 

- Caenorhabditis elegans S5 (T05E 11.1). 

■ Consensus pattern: P*NSKWUVM^^ 

liSSSS S " Ffanke R> Bef9mann U - K0S,ka S ■ B. Biol. Chem. Hop P e-S ey ,er 374: 

! 2 SS? E "k H ^ Sh ' moto T - Mi2uta K- Protein Seq. Data Anal. 5:285-300(1993) 

( 3] .gnatovcn Q. Cooper M.. Ku.esza KM.. Beggs J.D. NuCeic AcidsRes. 2&461 6-46 1 9(1 995). 

[1417] 564. Ribosomal protein S7e signature 

^mS^^T** nbOSOmal « be « *• «-* - ^uence simifcrities [,,. One o, 

Mammalian S7. 
Xenopus S8. 
Insect S7. 

- Yeast probable ribosomal protein S7 (N221 2) 

- Fission yeast probable ribosomal protein S7 (SpAC18G6. 1 3c). 

These proteins have about 200 amino acids. A hiqhly conserved strAtrh mm 

Th,-^ >^^«^<>^^«^^^^„ s ^^ 

■ O-^m. p,„ om: [G^BHUMBHSIYHSTl-xW^UVMI^HAGHKRHAYH 

[ 1 J Otaka E, Hashimoto I, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1422J 566. Ribosomal protein S8e signature 

lessees?" — - M . . mMes 

s«IMMa, a , igra i u „ pa „ em ™* ,m ™ lssol ""«»'>*'*i«ncl.nposlnv«l,«lB W d residue to e „ 

- C °«»™>»I«<e«lKRhxBHS^GA W 5HHRHKGHKB h ,^ HLM ,^ 
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- Consensus pattern: H-x-K-R-[LIVMFHSANK]-x-P-x(2)-[WYJ-x-[LIVM]-x-[KRP] 

[ 1] Fisher E.M., Beer-Romero P., Brown L.G., Ridley A., McNeil J.A., Lawrence J.B., Willard H.F., Bieber F.R., 
Page D.C. Cell 63:1205-1218(1990). 

[ 2] Braun HP., Emmermann M., Mentzel H., Schmrtz U.K. Biochim. Bbphys. Acta 1218:435-438(1994). 
[1413] 560. Ribosomal protein S5 signature 

Ribosomal protein S5 is one of the proteins trom the small ribosomal subunit. In Escherichia coli, S5 is known to be 
important in the assembly and (unction of the 30S ribosomal subunit. Mutations in S5 have been shown to increase 
translational error frequencies. It bebngs to a family of ribosomal proteins which, on the basis of sequence similarities 
[1,2], groups: - Eubacterial S5. 



- Cyanelle S5. - Red algal chloroplast S5. - Archaebacterial S5. 

- Mammalian S2 (LLrep3). - Caenomabditis elegans S2 (C49H3.11 ). 

'5 - Drosophila S2. - Plant S2. - Yeast S4 (SUP44). - Fungi mitochondrial S5. 



55 is a protein of 166 to 254 amtno-acid residues. The signature pattern for this protein is based on a conserved region, 
rich in glycine residues, and located in the N-terminal section of these proteins. 

20 . Consensus pattern: G-[KRQ]-x(3)-^x-{ACV]-x(2HLIVMA]-[LIVM]-[AGHDNJ-x(2)-G-x-[LIVM]-G-x-|SAGhx 
(5,6)-[DEQ]-[LIVMA]-x(2)-A-[LIVMFJ 

[ 1] All-Robyn J.A., Brown N., Otaka E., Liebman S.W. 

Mol. Cell. Biol. 10:6544-6553(1 990). [ 2] Otaka E., Hashimoto T. ( Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
25 [1414] 561 . Ribosomal protein S6 signature 

Ribosomal protein S6 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S6 is known to bind 
together with S18 to 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities, groups: - Eubacterial S6. - Red algal chloroplast S6. 

30 - Cyanelle S6. 

56 is a protein of 95 to 208 amino-acid residues. The signature pattern for this protein is based on a conserved region 
located in the N-terminal section of these proteins. 



35 - Consensus pattern: G-x-[KRC]-[DENQRH]-L-[SA]-Y-x-l-[KRNSAJ 
[1415] 562. Ribosomal protein S6e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

40 

- Mammalian S6 [1]. - Drosophila S6 [2]. - Plant S6 [3]. - Yeast S10 (YS4). 

Halobacterium marismortui HS13 [4]. - Methanococcus jannaschii MJ1260. S6 is the major substrate of protein 
kinases in eukaryotic ribosomes [5]; it may have an important role in controlling cell growth and proliferation through 
the selective translation of particular classes of mRNA. 

AS 

These proteins have 135 to 249 amino acids. 

A conserved stretch of 12 residues in the N-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: [LIVM]-[STAMR]-G-G-x-D-x(2)-G-x-P-M 

so 

[ 1) Franco R., Rosenfeld M.G. J. Biol. Chem. 265:4321-4325(1990). 

[ 21 Watson K.L, Konrad K.D., Woods D.F.. Bryant P.J. Proc. Natl. Acad ScL U.S.A. 89:11302-11306(1992). 
[ 3] Hansen G.. Estruch J. J., Spena A. Nucleic Acids Res. 20:5230-5230(1992). 

[ 4] Kimura M., Arndt E., Hatakeyama T, Hatakeyama T., Kimura J. Can. J. MtcrobioL 35:195-199(1989). 
55 [ 5] Bandi H.R., Ferrari S., Krieg J., Meyer H.E., Thomas G. J. Biol. Chem. 268:4530-4533(1993). 

[1416] 563. Ribosomal protein S7 signature 

Ribosomal protein S7 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S7 is known to bind 
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These proteins have from 220 to 250 amino acids 

A conserved stretch in their N-terminal section was selected as a signature pattern. 

- Consensus pattern: fLIv]-x-[GHJ-R-[lV]-x-E-x-[SC]-L-x-D-L 

[ 1]Liu J.K, Reid DM. 

Plant Physiol. 109:338-338(1995). 

[1410] 557. Ribosomal protein S3 signature 

sequence 

- Algal and plant chloroplast S3. - Cyanelle S3. . Archaebacterial S3 

- Plant mitochondrial S3. - Vertebrate S3. - Insect S3 

- Caenorhabditis elegans S3 (C23G1 0.3). - Yeast S3 (Rp1 3). 

S3 is a protein of 209 to 559 amino-acid residues 

A conserved region located in the C-termina, section has been selected as a signature pattern. 

1 1] Otaka E., Hashimoto T„ Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1411] 558. Ribosomal protein S4 signature 

Z^S^^S^^^^ M * cc, 84 is Known to bind 

t° a family of ribosomal P ™^^ 

and plant chloroplast ' *" *"* °' SeqUen ° e S ' milari,ies ™ 9 rou P* " Abacterial 84. - Algal 

- Cyanelle S4. - Archaebacterial S4. - Mammalian S9. - Yeast YS11 (SUP45) 

- Marchant.a polymorphs mitochondrial 84. - Dictyostelium discoideum rp1024 

!p!JS^p K "u Ha l him ° ,OT - Su2ukiK I -.O^kaE. Nucleic Acids Res. 19:2603-2608(1991) 

3 £.1 M n T \ MiZUta K Pr0,ein Seq 0313 Anal - 5:285-300(1993) ( > ' 
l 3 J Boguta M., Dmochowska A., Borsuk P Wrobel K fiarnn.m a i Q , , . ~ 

Kruszewska A. Mol. Cell. Biol. 12:402^1 2(1 992^ J ' S ^ m5ki P " Szc2e ^ B. ( 

[1 41 2] 559. Ribosomal protein S4e signature 

O^Ztr^ZT^T^ ribOSOmal Pf0teinS ™ be °" - «— - — ce similes. 

" .t^Sr^ 

- Plant cytoplasmic S4 [2] - Yeast S7 (YS6). - Archebacterial S4e. 
These proteins have 233 to 264 amino ac^s 
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A number of eukaryotic and archaebacteria! ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

Vertebrate S24 [1 ). - Yeast Rp50. - Mucor racemosus S24 [2]. 
5 - Halobacterium marismortui HS1 5 [3). - Methanococcus jannaschii MJ0394. 
These proteins have 101 to 148 amino acids. 

A well conserved stretch in the central part of these proteins has been selected as a signature pattern. 

10 - Consensus pattern: [FYA]-G-x(2)-[KR)-[STA]-x-G-[FYHGA]-x-[LIVM]-Y-[DN]-[SDN] 

[ 1] Brown S.J.. Jewell A. t Maki C.G., Roufa D.J. Gene 91:293-296(1990). 
[ 2] Sosa L, Fonzi W.A., Sypherd P.S. 

J5 [1406] Nucleic Acids Res. 17:931 9-9331(1 989). [ 3] Kimura J., Arndt E.. Kimura M. FEBS Lett. 224:65-70(1987). 
[1407] 554. Ribosomal protein S26e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian S26 [1 ]. 

20 - Octopus S26 [2]. - Drosophila S26 (DS31) [3]. - Plant cytoplasmic S26. 

- Fungi S26 [4]. 

These proteins have 114 to 127 amino acids. 
■ A conserved octapeptide in the central part of these proteins has been selected as a signature pattern. 

25 

- Consensus pattern: [YH]-C-V-S-C-A-I-H 

[ 1] Kuwano Y, Nakanishi O., Nabeshima Y, Tana ka T., Ogata K. J. Biochem. 97:983-992(1 985). [ 2) Zinov*eva R. 
D., Tomarev S.I. Dokl. Akad. Nauk SSSR 304:464-469(1989). 
30 [ 3] Itoh N., Ohta K., Ohta M., Kawasaki T, Yamashina L Nucleic Acids Res. 17:2121-2121(1989).[ 4] Wu M., Tan 

H. Gene 150:401-402(1994). 

[1408] 555. Ribosomal protein S28e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
35 One of these families consists of: 

- Mammalian S28 [1]. - Plant S28 (2]. - Fungi S33 [3]. 
Methanococcus jannaschii MJ 1202. 

<o These proteins have from 64 to 78 amino acids. 

A highly conserved nonapeptide from the C-terminal extremity of these proteins has been selected as a signature 
pattern. 

- Consensus pattern: E-[ST]-E-R-E-A-R-x-L 

45 

1 1] Chan Y.-L, Olvera J., Wool I.G. 

Biochem. Biophys. Res. Commun. 179:314-318(1991). 

[ 2] Hwang I.. Goodman H.M. Plant Physiol. 102:1357-1358(1993). 

[ 3] Hoekstra R., Ferreira P.M., Bootsman T.C., Mager W.H., Planta R.J. Yeast 8:949-959(1992) 

so 

[1409] 556. Ribosomal protein S3Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

55 - Mammalian S3A (was originally known as v-fos transformation effector protein). - Caenorhabditis elegans S3A 
(F56F3.5). 

- Plant cytoplasmic S3A (CYC07) [1]. - Yeast Rp10 (PLC1 and PLC2). 
Fission yeast Rp10 (SpAC1 3G6.02c). - Methanococcus jannaschii MJ0980. 
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[1.2J. One of these families consists of: -Mammalian S19. -DrosophiiaSl9. 

These proteins have 143 to 155 amino acids 
Awe„™se„eds, r e,^ 

- Consensus pattern: P-x(6)-[SAN]-x(2)-[LIVMAJ-x-R-x-[ALIV]-[LV]-Q-x-L-[EQJ 

[ 1J Etter A., Aboutanos M., Tobler H.. Mueller F. 

Proc. Natl. Acad. Sci. U.S.A. 88:1593-1596(1991) 

[ 2J Suzuki K., Olvera J.. Wool I.G. Biochimie 72:299-302(1990). 

[1400] 550. Ribosomal protein S2 signatures 
• Cyanelle S2. - Archaebacterial S2 

- Higher eukaryotes P40 (previously thought to be a laminin receptor) 

- Yeast NAB1 . - Plant mitochondrial S2. - Yeast mitochondrial MRP4 

S2 is a protein of 235 to 394 aminc-acid residues 

«ZT^Z*r Se,6Cted 33 Si9na,Ure °" e is * ■» N^ermina, section and the 

- Consensus pattern: *«?HUVMFH^^ 

[ 1] Davis S.C., Tzagoloff A., Ellis S.R. 
J. Biol. Chem. 267:5508-5514(1992) 

1 2J Tohgo A., Tarawa S.. Munakata H.. Yonekura H., Hayashi N.. Okamoto H. FEBS Lett. 340:133-138(1994). 
[1401] 551. Ribosomal protein S21 signature 

protern has been selected as a sigiiaure ^n^T^^ residues. A conserved region in the N-termtnal section of the 

• Caenorhabditis elegans S21 (F37C12.11). - Rice S21 1*21 

- Yeast S21 (Ys25) [3]. - Fission yeast S28 [4]. 

These proteins have 82 to 87 amino acids 

A perfect, conserved nonapeptrte in the Nlerm^a, part of these P-feins has been se.ecte. as a signature paUem. 

- Consensus pattern: L-Y-V-P-R-K-C-S-[SA] 

[1405] 553. Ribosomal protein S24e signature 
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- Yeast S18a and S18b (RP41; YS12). 



The best conserved regions located in the C-terminal sections of these proteins have been selected as a signature 
pattern. 

5 

- Consensus pattern: G-D-x-[LIV)-x-[LIVA]-x-{QEKl-x-[RK]-P-[LI VJ-S 

[ 1] Gantt J.S.. Thompson M.D. J. Biol. Chem. 265:2763-2767(1990). 
[ 2] Herfurth E,, Hirano H. f Wittmann-Liebold B. 
io Biol. Chem. Hoppe-Seyler 372:955-961(1991). 

[ 3] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 



is 



20 



[1396] 546. Ribosomal protein S17e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Vertebrates S17 [1]. - Drosophila S17 [2]. - Neurospora crassa S17 (crp-3). 

- Yeast S1 7a (RP51 A) and S1 7b (RP51 B) [3]. - Methanococcus jannaschii MJ0245. These proteins have from 63 
(in archebacteria) to 1 30 to 1 46 amino acids and are highly conserved. A region in the central part of these proteins 
has been selected as a signature. 

- Consensus pattern: A-x-l-x-{ST]-K-x-L-R-N-[KR]-l-A-G-[FYJ-x-T-H 

( 1] Chen l.-T., Roufa D.J. Gene 70:107-116(1988). 
2S [ 2] Maki C, Rhoads D.D., Stewart M.J., van Slyke B., Denell R.E.. 

Roufa D.J. Gene 79:289-298(1 989).[ 3] Abovich N. ( Rosbash M. 
Mol. Cell. Biol. 4:1871-1879(1984). 



[1397] 547. Ribosomal protein S18 signature 

Ribosomal protein S18 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S18 has been 
involved in aminoacyi-tRNA binding[1]. It appears to be situated at the tRNA A-site of the ribosome. It belongs to a 
family of ribosomal proteins which, on the basis of sequence similarities^], groups: - Eubacterial S18. - Algal and plant 
chloroplast S18. - Cyanelle S18.As a signature pattern, a conserved region in the central section of the protein has 
been selected. This region contains two basic residues which may be involved in RNA-binding - 
Consensus pattern: [l\^-[DYl-Y-x(2)-[LIVMT]-x(2)-[LIVM]-x(2)-[FYT]-[LIVM]- [STJ-[DERPhx-[GY]-K-[LIVM]-x(3)-R- 
• [LIVMAS]- 

[ 1] McDougall J., Choli T, Kruft V., Kapp U., Wittmann-Liebold B. FEBS Lett. 245:253-260(1 989). [ 2) Otaka E Hash- 
imoto T, Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
[1398] 548. Ribosomal protein S19 signature 

Ribosomal protein S19 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S19 is known to 
form a complex with S13 that binds strongly to 16S ribosomal RNA. S19 belongs to a family of ribosomal proteins 
which, on the basis of sequence similarities [1,2], groups: - Eubacterial S19. 

- Algal and plant chloroplast S19. - Cyanelle S19. - Archaebacterial S19. 
45 - Plant mitochondrial S19. - Eukaryotic S15 ('rig 1 protein). 

S19 is a protein of 88 to 144 amino^cid residues. Our signature pattern is based on the few conserved positions 
located in the C-terminal section of these proteins. 



30 



35 



40 



SO 



Consensus pattern: tSTDNQ]-G-[KRQM]-x(6)-[Uv^]-x(4)-[LIVM]-[GSD]-x(2)-[LFHGASHDE]-F-x(2)-[ST] 



[ 1] Kitagawa M., Takasawa S. t Kikuchi N., Itoh T, Teraoka H. ( Yamamoto K, Okamoto H FEBS Lett 283 210-214 
(1991). 

[ 2] Otaka E., Hashimoto T., Mizuta K. 
55 Protein Seq. Data Anal. 5:285-300(1993). 



[1399] 549. Ribosomal protein S19e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence, similarities 
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groups: 

Eubacterial S14. 

- Algal and plant chloroplast S14. 
5 - CyanelleS14. 

- Archaebacterial Methanococcus vannielii S 1 4. 
Plant m rtochondrial S 1 4. 

- Yeast mitochondrial MRP2. 
Mammalian S29. 

"> - Yeast YS29A/B. 



75 



20 



25 



30 



35 



45 



SO 



55 



[1387] S14 is a protein of 53 to 115 amino-acid residues. Our signature pattern is h»«*ri «n * « 

positions located in the center of these proteins P ba8ed 00 me ,ew ^ened 

[1388] Consensus partem: [RP]-x(0,l)-C-x<11 ( 12)-[UVMF]-x-[UVMFHSCHRGhx(3)-[RN] 

[1] Chan Y-L, Suzuki K., Otvera J., Wool I.G. Nucleic Acids Res 21 649-655,1 9931 
[2J Otaka E . t Hashimoto T . Mizuta K. Protein Seq. Data Anal. 5:»«5S) 

[1 389] 543. Ribosomal protein S1 5 signature 

wh,ch, on the basis of sequence similarities [1,2], groups: - Eubacterial S1 5 ^ "bosomai protems 

- Archaebacterial Halobacterium marismortui HmaS15 (HS11 ) 

- Plant chloroplast S15. - Yeast mitochondrial S28. . Mammalian S13 

- Brugia pahang. and Wuchereria bancrofti S13 (S15). - Yeast S13 (YS15). 

S15 is a protein of 80 to 250 amino-acid residues 

A conserved region Seated in the C-termina, pari of these proteins has been selected as a signature pattern. 

- Consensus pattern: IMVM^^ 

[1J Dang H., Ellis S.R. 
Nucleic Acids Res. 18:6895h6901(1990). 
[ 2] Otaka E. f Hashimoto T., Mizuta K. 
Protein Seq. Data AnaL 5:285-300(1993). 

[1 390] 544. Ribosomal protein S1 6 signature 

- Eubacteriat S16. 

- Algal and plant chloroplast S1 6. 

- Cyanelle S16. 

- Neurospora crassa mitochondrial S24 (cyt-21 ). 

mSJ cTT ^'^^-'L'^J-I^HStakj-r-x^akr] 

nS [ ^^X^n~ ( ° teh seq Da * ** «»*""* 

codons. .« belongs to a family of r*£SZ™ 5£ 7*" " *" reC09niti0n °» ,e ™ atio " 

Eubacterial S17. P ^' °° me 1,85,8 °' se °."ence similarities [1,2,3]. groups: - 

- Plant chloroplast S17 (nuclear encoded). - Red algal chloroplast S17 

- Cyanelle S17. - Archaebacterial S17. - Mammalian and plant cytoplasmic S11. 
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- Consensus pattern: [UVMF]-x-[GSTACJ-[LIVMFJ-x(2)-[GSTAL]-x(0,1)-[GSN]-[LIVMF]-x-[LIVM]-x(4)-IDEN]-x-T-P- 
x-[PA)-[STCHHDN] 

[ 1 J Kimura M., Kimura J., Hatakeyama T. FEBS Lett. 240:15-20(1988). 
[ 2) Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1382] 539. Ribosomal protein S12 signature 

Ribosomal protein S12 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S12 is known to be 
involved in the translation initiation step. It is a very basic protein of 120 to 150 amino-acid residues. S12 belongs to 
a family of ribosomal proteins which, on the basis of sequence similarities [1], groups: - Eubacterial S12. - Archaebac- 
terialS12. 

- Algal and plant chloroplast S12. - Cyanelle S12. 

- Protozoa and plant mitochondrial S12. - Yeast S28. 

- Drosophila mitochondrial protein tko (Technical KnockOut). - Mammalian S23. The best conserved regions in these 
proteins, located in the center of each sequence have been selected as a signature pattern. 

- Consensus pattern: [RK]-x-P-N-S-[AR]-x-R 

[ 1] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1383] 540. Ribosomal protein S12e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Vertebrate S12 (1], 

- Trypanosoma brucei S12 [2]. - Caenorhabditis elegans S12 (F54E7.2). 

• Drosophila S12. - Yeast S12. 

These proteins have 130 to 150 amino acids. 

A conserved region in the N-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: A-L-[KRQP]-x-V-L-x(2)-[SA]-x(3)-[DN]-G-L 
[ 1) Lin A, Chan Y.-L, Jones R., Wool I.G. 

J. Biol. Chem. 262: 14343-1 4351(1 987). [ 2]MarchalC., Ismaili N., Pays E. Mol. Bkxhem. Parasitol. 57:331-334(1993). 
[1 384] 54 1 . Ribosomal protein S 1 3 signature 

Ribosomal protein S1 3 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S1 3 is known to be 
involved in binding fMet-tRNA and, hence, in the initiation of translation. It is a basic protein of 115 to 1 77 amino-acid 
residues and belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1,2], groups: - 
Eubacterial S 13. 

- Plant chloroplast S1 3 (nuclear encoded). - Red algal chloroplast S1 3. 

• Cyanelle S13. - Archaebacterial S1 3. - Plant mitochondrial S13. 
Mammalian and plant S 1 8 . 

The best conserved regions in these proteins, located in their C-terminal part have been selected as a signature pattern. 

- Consensus pattern: [KRQS]-G-x-R-H-x(2)-[GSNH]-x(2)-[LIVMC]-R-G-Q 
[ 1j Chan Y.-L, Paz V. t Wool I.G. 

Biochem. Biophys. Res. Commun. 178:1212-1218(1991). 
[ 2] Otaka E. t Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1385] 542. Ribosomal protein S14p/S29e (Ribosomal protein S14 signature) 

[1386] Ribosomal protein SI 4 is one of the proteins from the small ribosomal subunit In Escherichia coli, S14 is 
known to be required for the assembly of 30S particles and may also be responsible for determining the conformation 
of 16S rRNA at the A site. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1,2], 
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- Vertebrate L7A (SURF3) [1). - Pl a n , L7A. - Yeast L7A (YL5) (Rp6) 

- Yeas protein NHP2 [2]. - Yeast hypothetical protein YEL026w 

" M^f SUbti ' iS hyP ° ,he,,cal P ro,ein V'xQ. - Halobacterium ma'rismortui Hs6 

- Methanococcus jannaschii MJ1203. 

[1377] These proteins have 100 to 265 amino-acid residues 

A conserved region kocated in the centra, sectbn has been selected as a signature pattern. 

- Consensus pattern: [CA]-x(4)-[IVl-P-lFY]-x(2)-[LIVM]-x-[GSQJ.[KRQ].x(2)-L-G 

IJ! ?' n 0 ". 1 K - Ffied M Pf0C - Natl Acad - Sd U.S.A. 89-6358-6362(19921 

[ 2] Kolodrubetz D„ Burgum A. Yeast 7:79-90(1991). a ojao-ojo^iis92). 

[1378] 536. Ribosomal protein L9 signature 

groups: - Eubacterial L9. - Cy^SSiSS P ' 00 *" baS ' S ° f 8 «' UB ™» [1.2J. 

- Plant chloroplast L9 (nuclear-encoded). - Red algal chloroplast L9. 

A conserved region, located in the N -termina. sectfcn o, these proteins has been selected as a signature pattern. 

- Consensus pattern: G-x(2)-[ G NJ-x(4)-V-x(2,-G- ( FY]-x( 2 )- N -fFY].L.x(5 H GA]-x(3)-[STN] 

, Lol^994 D ) W ' D3VieS C - S E ' KyC " J H ' P ° rter S J - Whi,e S W ■ "amakrishnan V. EMBOJ. 1 3: 

[ 2] Otaka E.. Hashimoto t, Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:30! -31 3(1 993). 
[1 379] 537. Ribosomal protein S1 0 signature 
«^ 

similarities [1], groups: - EubacteriaTSa * P '° temS 00 the 63515 ° 4 ^quence 

- Algal chloroplast S10. - Cyanelle S10. - Archaebacterial S10 

S10 is a protein of about 100 amino-acid residues 

[1380, Aca^^ tacB « hta ^ - ^^ ta ^^ Mai ^^ 

- Consensus pattern: [AV>x(3)-[GDNSRHUVMSTA]-x(3)-G-P-[LIVM]-x-[LIVM]-P-T 

f 1] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1381] 538. Ribosomal protein S11 signature 

sequence similarities, groups [2]: - Eubacterial SI" I. * nbosomal proteins which, on the basis of 

- Algal ana plant chloroplast S11 . - Cyanelle S11. - Archaebacterial S11 

- Marchant.a polymorpha and Prototheca wickerhamii mitochondrial S11 

- ^^^bacastellaniimita^drjalSll.-Neurosooracrassa^A/^oN v 

- MammaBan. Drosophila. Trypanosoma, and plan. s?4 ^ ' ^ 814 (RP59 w CRY1 > 

Caenomabditis elegans S14 (F37C1 2.9). 

One of the best conserved regions in these protebs was selected as a signature pattern. 
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L5 is a protein of about 180 amino-acid residues. 

A conserved region, located in the first third of these proteins has been selected as a signature pattern. 

- Consensus pattern: [LIVM]-x(2HLIVM]-[STAV^ 

[ 1] Hatakeyama T, Hatakeyama T. Biochim. Biophys. Acta 1039:343-347(1990). 
[ 2] Rosendahl G., Andreasen PH., Kristiansen K. Gene 98:161-167(1991). 

[ 3] Yang D., Gunther l. f Matheson A.T., Auer J., Spicker G., Boeck A. Biochimie 73:679-682(1991). 
[ 4] Otaka E., Hashimoto T., Mizuta K. ( Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

[1370J 532. ribosomal L5P family C-terminus 

[1 371) This region is found associated with Ribosomal_L5. Number of members: 60 
[1372] 533. Ribosomal protein L6 signatures 

[1373] Ribosomal protein L6 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L6 is known 
to bind directly to the 23S rRNA and is located at the aminoacyl-tRNA binding site of the peptidyftransferase center. It 
belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1,2,3,4], groups: - Eubacterial L6. 

Algal chloroplast L6. 
Cyanelle L6. 
Archaebacterial L6. 

Marchantia polymorpha mitochondrial L6. 

- Yeast mitochondrial YmL6 (gene MRPL6). 
Mammalian L9. 

Drosophila L9. 

- Plants L9. 

- Yeast L9(YL11). 

[1374] While all the above proteins are evolutionary related it is very difficult to derive a pattern that will find them 
all. Two patterns were therefore created, the first to detect eubacterial. cyanelle and mitochondrial L6, the second to 
detect archaebacterial L6 as well as eukaryotic L9. 

- Consensus pattern: [PS]-[DENS]-x-Y-K-[GA]-K-G-[UVM] 

- Consensus pattern: Q-x(3)-[LIVM]-x(2)-[KR]-x(2)-R-x-F-x-D-G-[UVM]-Y-[LIVMJ-x(2)-[KR] 
[1] Suzuki K., Olvera J., Wool I.G. Gene 93:297-300(1990). 

[2] Schwank S., Harrer R, Schueller K-J., Schweizer E. Curr. Genet 24:136-140(1993). 

[3] Golden B.L, Ramakrishnan V, White S.W. EMBO J. 12:4901-4908(1993). 

[4] Otaka E., Hashimoto T, Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

[1375] 534. Ribosomal protein L6e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Mammalian ribosomal protein L6 (L6 was previously known as TAX-responsive enhancer element binding protein 
107). 

Caenorhabdrtis elegans ribosomal protein L6 (R151 .3). 

Yeast ribosomal protein YL16A/YL16B. 

Mesembryanthemum crystallinum ribosomal protein YL16-like. 

These proteins have 175 (yeast) to 287 (mammalian) amino acids. A highly conserved region in the central part of 
these proteins has been selected as a signature pattern. 

- Consensus pattern: N-x(2)-P-L-R-R-x(4)-{FY]-V-l-A-T-S-x-K 
[1376] 535. Ribosomal protein L7Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 
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A conserved region in the central pan of these proteins has been selected as a signature pattern. 
• Consensus pattern: P-Y-E-fKR]-R-x-[LIVM]-[DE]-[LIVM](2)-[KR] 

[ 1] Chan Y.-L, Paz V., Olvera J., Wool I.G. 

Brochem. Biophys. Res. Common. 192:849-853(1993). 

[1366] 528. Ribosomal protein L39e signature 

£ SSESET 6 * 0 "" i "' * oso ™' prae,ns - M " - — - 

- Mammalian L39 - Plants L39. - Yeast L46 12] - Archebacterial t qq» ro, Th 
^-■ong.meyarethesr^^ 

section of these proteins has been selected as a signature pattern Ac0nserved re 9'°" « •» C-terminal 

- Consensus pattern: [KRA]-T-x(3 H LIVM H KRQFJ-x-[NHSl-x(3)-R-[NHY]-W-R-R 

[ 1] Lin A., McNally J., Wool I.G. J. Biol. Chem. 259:487-490(1984) 
[2] Leer R.J., van Raamsdonk-Duin M.M.C.. Kraakman P., Mager W H 
Planta R.J. Nucleic Acids Res. 13:701-709(1985) " " 

[ 3] Ramirez C. Louie K.A.. Matheson AT. FEBS Lett. 250:416-418(1989). 

[1367] 529. Ribosomal L40e family 

SSZSZXZ 27^ " 8 SeCOndarV RNA **■ ^ M "0 - *- * a ubMl* prote* [2] . 
[1] 

Medline: 88203200 

R pfe,vsz?k 9 MA 0 n " ? SUbUnit °» b0Vine m ««=hondria. ribosomes. 
natyszek MA, Denslow ND, O'Brien TW; 

Nucleic Acids Res 1988;16:2565-2583 

Chan YL, Suzuki K. Wbol IG; 
Biochem Biophys Res Commun 1995 215 682-690 
[1368] 530. (Ribosomal L44) Ribosomal protein L44e signature 

cTo^^ 

• Mammalian L44 [1]. - Trypanosoma brucei L44 

- Caenorhabditis elegans L44 (C09H1 0.2). - Fungal L44 (L41 ) 
Halobacterium marismortui LA [2]. 

These proteins have 92 to 105 amino-acid residues 

A conserved region located in the C-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: K-x-[TV]-K-K-x(2)-L-[KR]-x(2)-C 

[ 1J Gallagher M.J., Chan Y.-L, Lin A., Wool I.G. DNA 7:269-273(1988) 
[ 2] Bergmann U. t Wittmann-Liebold B. ' 
Biochim. Biophys. Acta 1173:195-200(1993 

[1 369J 531 . Ribosomal protein L5 signature 

basis of sequence srnilarities V .2*% t^^^lt*** ' ^ * Pr0teinS *** °" the 

- tSI^T^k ' ^ "^ ane " e L5 ■ Archaebac ^l L5. - Mammalian L11 
Tetrahymena thermophria L21. - Slime mold L5 (V18) - Yeast L16 (39A) 

- Plants mitochondrial L5. ' b (J9A) ' 
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sequence similarities, groups: - Eubacterial L34. 

Red algal chloroplast L34. - Cyanelle L34. 
A conserved region that corresponds to the N-terminal half of L34 has been selected as a signature pattern. 

- Consensus pattern: K*[RG}-T-[FYWLl-[EQS]-x(5)-[KRHS]-x(4,5)-G-F-x(2)-R 

[ 1) Old I.G., Margarita D., Saint Girons I. 
Nucleic Acids Res. 20:6097-6097(1992). 
[1362] 524. Ribosomal protein L34e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Mammalian L34. - Mosquito L31 [1 J. - Plant L34 [2J. 

- Yeast putative ribosomal protein YIL052c. - Methanococcus jannaschii MJ0655. These proteins have 89 to 129 
amino-acid residues. 

A conserved region located in the N-terminal section of these proteins has been selected as a signature pattern. 

- Consensus pattern: Y-x-[ST]-x-S-[NY]-x(5)-[KR]-T-P-G 

[ 1] Lan Q., Niu LL, Fallon A.M. 

Biochim. Biophys. Acta 1218:460-462(1994). 

[ 2) Gao J., Kim S.ft. Chung Y.Y., Lee J.M., An G. 

Plant Mol. Biol. 25:761-770(1994). 

[1363] 525. Ribosomal protein L35Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Vertebrate L35A. - Caenorhabditis elegans L35A (F10E7.7). 

- Yeast L37A/L37B (Rp47). - Pyrococcus woesei L35A homolog [1]. 

These proteins have 87 to 110 amino-acid residues. 

A highly conserved stretch of 22 residues in the C-terminal part of these proteins has been selected as a signature 
pattern. 

- Consensus pattern: G-K-|UVM]-x-R-x-H-G-x(2)-G-x-V-x-A-x-F-x(3)-[U]-P 

[ 1] Ouzounis C., Kyrpides N., Sander C. 
Nucleic Acids Res. 23:565-570(1995). 
[1364] 526. Ribosomal protein L36 signature 

Ribosomal protein L36 is the smallest protein from the large subunit of the prokaryotic ribosome. It belongs to a family 
of ribosomal proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L36. - Algal and plant 
chloroplast L36. - Cyanelle L36.L36 is a small basic and cysteine-rich protein of 37 amino-acid residues. As a signature 
pattern, a conserved region that corresponds to positions 1 1 to 36 in L36 and includes three conserved cysteine residues 
has been developed. 

Consensus pattern: C-x(2)-C-x(2)-[LIVM]-x-R-x(3)-|LIVMN]-x-[LIVM]-x-C-x(3,4)-[KR]-H-x-Q-x-0- 
[ 1] Otaka E., Hashimoto T., Mizuta K Protein Seq. Data Anal. 5:285-300(1993). 
[1365] 527. Ribosomal protein L36e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian L36 [ 1 ]. 

- Drosophila L36 (M(1)1B). - Caenorhabditis elegans L36 (F37C12.4). 

- Candida albicans L39. - Yeast YL39. 



These proteins have 99 to 104 amino acids. 
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- Drosophila L7. - Slime mold L7. - Mammalian L7. - Fungi L7 (YL8) 

- Yeast mitochondrial L33. K h 

^ T archaeb3Ctefia - of about 1 50 

of proteins is shown betow.Eub .LiNtoSS f 7 ° reS ' dUeS - 7,10 ^^^tionship between the three groups 
Arc. L30 NxxxxxxxxxxxxxxxxxxxxxxxxxxxC 

[ 1] Mizuta K., Hashimoto T. ( Otaka E. 
Nucleic Acids Res. 20:1011-1016(1992). 
[1357] 520. Ribosomal protein L31 signature 

A conserved reg.on fccated in the centra, sec**, of these proteins has Sen selves a signature pa„er, 
- Consensus pattern- H-P-F-[FY]-[TI]-x(9)-G-R-[AIV]-x-[KRQ] 
[13S8) 521. Ribosomal protein L31e signature 
Sn^rse^^^ 

• Mammalian L31 [1J. - Chlamydomonas reinhardtii L31 - Yeast L34 
Haiobacterium marismortui HL30 [2]. 

These proteins have 87 to 128 amino-acid residues 

A conserved region, located in me central section has been se.ec.ed as a signature pattern. 

- Consensus pattern: V-[KRJ-[LIVM]-x(3)-[LIVMJ-N-x-tAKH]-x-W-x-[KR]-G 

1 1] TanaKa T , Kuwano V., KuzumaKi T., Isbi^ K.. Ogata K. Eu, , Biochem. 162,5-48(1987), 2] Bergmann U. 

Btochim. Biophys. Acta 1050:56-60(1990). 
[1359] 522. Ribosomal protein L33 signature 

TotSe^ 

similarities [1,2,3], groups: - Eubacterial L33 ' bOSOma ' Pr ° ,ei " S °" ,he *»»• °' sequence 

- Algal and plant chbroplast L33. - Cyanelle L33. 

- Consensus pattern: Y-x,ST]-x- [ KRHNS h x(4)- [P ATQ ] -x(1,2H LI VMHEA]-x(2)-K-|FY,. [ CSD, 

! ?! V o»^ P U " Winmann - Li ebold B. Biochimie 73:855-860(1991) 
! 2J Sharp RM. Gene 139:129-130(1994). 
[ 3) Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1360] 523. Ribosomal protein L34 signature 

1 j. lj4 oeiongs to a family of nbosomal proteins which, on the basis of 
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The schematic relationship between these groups of proteins is shown below. Eub. L27 NxxxxxxxxxAlgal L27 
Nxxxxxxxxx 

Plant L27 tttltNxxxxxxxxxxxxx 

Yeast MRP7 tttNxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

transit peptide. 
'N': N-terminal of mature protein. position of the pattern. 

- Consensus pattern: G-x-[LIVM](2)-x-R-Q-R-G-x(5)-G 

[ 1] Elhag G.A., Bourque D.P Biochemistry 31:6856-6864(1992). 
[ 2] Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1 353] 5 1 6. Ribosomal L28 family 

The ribosomal 28 family includes L28 proteins from bacteria and chloroplasts. The L24 protein from yeast Swiss: 
P36525 also contains a region of similarity to prokaryotic L28 proteins. L24 from yeast is also found in the large ribos- 
omal subunit 
Number of members: 24 
[1354] 517. Ribosomal protein L29 signature 

Ribosomal protein L29 is one of the proteins from the large ribosomal subunit. L29 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L29. - Red algal L29. 

- Archaebacterial L29. - Mammalian L35 - Caenorhabditis elegans L35 (2K652.4) 

- Yeast L35. 



L29 is a protein of 63 to 138 amino-acid residues. 

A conserved region located in the central section of L29 has been selected as a signature pattern. 

- Consensus pattern: [KNQS]-[PSTLl-x(2)^LIMFA}-[KRGSAN]-x-[LIWSTA]-[KRHKRHQS].[DESTANRLl-fLIVl-A- 
[KRCQVT|-[LIVMA] 

[ 1] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1 355] 518. Ribosomal protein L3 signature 

Ribosomal protein L3 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L3 is known to bind 
to the 23S rRNA and may participate in the formation of the peptidyltransferase center of the ribosome. It belongs to 
a family of ribosomal proteins which, on the basis of sequence similarities [1,2,3,4], groups: - Eubacterial L3 - Red 
algal L3. - Cyanelle L3. 



Archaebacterial Halobacterium marismortui HmaL3 (HL1 ). 

- Yeast L3 (also known as trichodermin resistance protein) (gene TCM1 ). 

- Arabidopsis thaliana L3 (genes ARP1 and ARP2). - Mammalian L3 (L4). 

- Mammalian mitochondrial L3. - Yeast mitochondrial Yml_9 (gene MRPL9). A conserved region located in the central 
section of these proteins has been selected as a signature pattern. 

- Consensus pattern: [FL]-x(6)-[DN]-x(2)-[AGS)-x-[ST]-x-G-[KRH]-G-x(2)-G.x(3)-R 

[ 1J Arndt E.. Kroemer W., Hatakeyama T J. Biol. Chem. 265:3034-3039(1990). 

[ 2] Graack H.-R., Grohmann L, Kitakawa M., Schaefer K.L, Kruft V. 

Eur. J. Biochem. 206:373*380(1992). 

[ 3] Herwig S., Kruft V., Wrttmann-Liebold B. 

Eur. J. Biochem. 207:877-885(1992). 

[ 4] Otaka E., Hashimoto T, Mizuta K., Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993). 

[1356] 519. Ribosomal protein L30 signature 

Ribosomal protein L30 is one of the proteins from the large .ibosomal subunit. L30 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities (1), groups: - Eubacterial L30. - Archaebacterial L30. 
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- Algal and plant chloroplast L23. - Archaebacterial L23. - Mammalian L23A. 

- Caenorhabditis elegans L23A (F55D10.2). - Fungi L25. 

- Yeast mitochondrial YmL41 (gene MRPL41 or MRP20). 

Kg ^^JZ££Z2& — " - *- — » «— % in ™A- 

- Consensus pattern: [RK](2)-[AMJ-[I VFYT]-[IV]-[RKT]-L-[STAN EQKJ-x(7)-[LI VMFT] 

[1] El Baradi T.T.A.L. Raue H.A.. van de Regt C.H.F.. Verbree E C 

Planta RJ. EMBO J. 4:210-2107(1985). . 

J 2] Raue HA, Otaka E., Suzuki K. J. MoL Evol. 28:418-426(1 989) 

( 3] Fearon K., Mason T.L. J. Biol. Chem. 267:5162-5170(1992) 

[ 4] Otaka E., Hashimoto T„ Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

11 350] 51 3. Ribosomal protein L24 signature 

Yeast L26 (YL33). - Archaebacterial HmaL24 (HL15) 

- A probable ribosomal protein from Sulfolobus acidocaldarius [1 J. 

In their mature form, these proteins have 103 to 150 amino-acid residues 

A conserved stretch of 20 residues in their N-terminal sectbn has been selected as a signature pattern. 

- Consensus pattern: PDErfl^x-^^ 

[ 1] Ouzounis C., Kyrpides N., Sander C. 

Nucleic Acids Res. 23:565-570(1995). 

t1351] 514. Ribosomal protein L24e signature 

O^lr^^T^ ribOSOma ' Pr ° ,einS « ^ 9W °" - »- - sequence «ie, 

Mammafian ribosomal protein L24. 

- Yeast ribosomal protein L30A/B (Rp29) (YL21 ). 

- Kluyveromyces lactis ribosomal protein L30. 

- Arabidopsis thaliana ribosomal protein L24 homolog. 

- Haloarcula marismortui ribosomal protein HL21/HL22. 
Methanococcus jannaschii MJ1 201 . 

snsr ~,: — - — » — » - 

- Consensus pattern. [FY]-x-[GSH]-x(2)-[IV]-x-P-G.x-G.x(2).[FYV]-x-[KRHE]-x-D 

nl^Ta's'^™* J ,' W °° ! ' G BiOChSm Bi0phyS ReS - Cor ™ 202:1176-1180(1994) 
P352] 515. Ribosomal protein L27 signature '* 

■ Plant chloroplast L27 (nuclear-encoded). - Algal chloroplast L27 

- Yeast mitochondrial YmL2 (gene MRPL2 or MRP7). 
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- Consensus pattern: K-x(3)-[KRC]-x-[LIVMhW-[IV].[STNALV)-R-[LIVM]-[NS]-x(3HRKHS] 

[ 1] Otaka E., Hashimoto T. ( Mizuta K., Suzuki K. 

Protein Seq. Data AnaL 5:301-313(1993). 
[1345] 509. Ribosomai protein L21e signature 

A number of eukaryotic and archaebacterial ribosomai proteins can be grouped on the basis of sequence similarities 
One of these families consists of: 

- Mammalian L21 [1]. - Entamoeba histolytica L21 (2). 

- Caenorhabditis elegans L21 (C14B9.7). - Yeast L21E (URP1) [3]. 
Halobacterium marismortui HL31 [4J. 

These proteins have 160 (eukaryotes) or 95 (archebacteria) amino-acid residues. A conserved region in the central 
part of these proteins has been selected as a signature pattern. 

- Consensus pattern: G-[DE]-x-V-x(1 0)-[G v>x(2)-[FYH]oc(2)-[FY>x-G-x-T-G 

[ 1) Devi K.R.G., Chan Y-L, Wool I.G. 
Biochem. Biophys. Res. Commun. 162:364-370(1989). 
[ 2] Petter R., Rozenblatt S., Nuchamowitz Y, Mirelman D. 
Mol. Biochem. Parasitol. 56:329-333(1992). 

[ 3] Jank B., Waldherr M., Schweyen R.J. Curr. Genet. 23:15-18(1993). 
[ 4] Hatakeyama T. t Kimura M. Eur. J. Biochem. 172:703-711(1988). 

[1346] 510. Ribosomai protein L21 signature 

Ribosomai protein L21 is one of the proteins from the large ribosomai subunit. In Escherichia coli, L21 is known to bind 
to the 23S rRNA in the presence of L20. It belongs to a family of ribosomai proteins which, on the basis of sequence 
similarities, groups: - Eubacterial L21. 

Marchantia polymorpha chloroplast L21 . - Cyanelle L21 . 
Spinach chloroplast L21 (nuclear-encoded). 

Eubacterial L21 is a protein of about 100 amino-acid residues, the mature form of the spinach chloroplast L21 has 200 
residues. A conserved region located in the C-terminai section of these proteins has been selected as a siqnature 
pattern. 3 

Consensus pattern: [IVT]-x(3)-[KR]-x(3)-[KRQJ-K-x(6)-G-[HF]-R-[RQ]-x(2)-[ST] 

[1 347] 51 1 . Ribosomai protein L22 signature V 
Ribosomai protein L22 is one of the proteins from the large ribosomai subunit. In Escherichia coli, L22 is known to bind 
23S rRNA. It belongs to a family of ribosomai proteins which, on the basis of sequence similarities [1 2 31 qrouDs- - 
Eubacterial L22. ' ' 

- Algal and plant chloroplast L22 (in legumes L22 is encoded in the nucleus instead of the chloroplast) - Cvanelle 
L22. - Archaebacterial L22. 

Mammalian L17. - Plant L17. - Yeast YL17. 

A conserved region located in the C- terminal section of these proteins has been selected as a signature pattern. 

- Consensus pattern: [RKQN]-x(4)-[RH]-[GAS)-x-G-[KRQS]-x(9).[HDN]-[LIVM)-x-[LIVMSJ-x-[LIVM] 
[ 1] Gantt J.S., Baldauf S.L, Calie P.J., Weeden N.F., Palmer J.D. 

EMBO J. 10:3073-3078(1991 ).[ 2] Madsen L.H., Kreiberg J.D.. Causing K. Curr. Genet. 19 417-422(1991) 
[ 3] Otaka E., Hashimoto T.. Mizuta K., Suzuki K. 
Protein Seq. Data Anal. 5:301-313(1993). 

[1348] 512. Ribosomai protein L23 signature 

Ribosomai protein L23 is one of the proteins from the large ribosomai subunit. In Escherichia coli, L23 is known to bind 
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a 3 ? 1 ! . TheSS Pro,einS have 148 to 203 aminc-ackJ residues 
stretch o, about 20 residues in the N-termina, par, of these proteins has been selected as a si 9 „ature pa«em. 

- Consensus pattern: ^KR h R-IL.VMH4SA].x(4)- ( C^-x(3 H . \^WKh[LIVF]4DN h P 
[1] Chan Y.-L, Un A.. McNally J., Peleg D., Meyuhas O., Wool I G 

J- BioL Chem. 26^1111-111 5 (1987).[ 2] Hart K., Klein T., Wilcox M 
Mech Dev 43:101-110(1993).! 3] Singleton C.K.. Manning S.S . Ken R 
Nucleic Acids Res. 17:9679-9692(1989). tt 
[1342] 506. Ribosomal protein Lie signature (Ribosomal_L4) 

- Fission yeast L2. - Halobacterium marismorlui HmaL4 (HL6) 

- Methanccoccus jannaschii MJ0177. 

- Consensus pattern: 

[ 1J Rafti F., Gargiulo G., Manzi A., Malva C , Graziani F. 

Nucleic Acids Res. 1 7:456-456(1 989).[ 2] Presutti C, Villa T Bozzoni I 

Nucleic Acids Res. 21 :3900-3900(1 993). 

[ 3] Bagni C, Mariottini P., Annesi F., Amaldi F. 

Biochim. Biophys. Acta 1216:475-478(1993) 

[ 3J Arndt E., Kroemer W, Hatakeyama T. J. Biol. Chem. 265:3034-3039(1990). 
[1343J 507. Ribosomal protein L2 signature 

Sr^ 

basis of sequence similarities (1 .2], 2o^!S^SSl1 ^ * * " P'**™ °" *• 

' £2 f? 'S* Chloro P las, U - - ^lle L2. - Archaebacterial L2 

- 21^™ T ^ ^ ~ PO'ymorpha mitochondria. L2. 

a^SrTSr Te9i0n ,0Ca,ed ^ - — ^ been seiected as 

- Consensus pattern: P-x(2)-R-G-[STAIV)(2)-x-N-IAPK]-x-[DE] 
[ 1] Marty I., Meyer Y. 

Nucleic Acids Res. 20:1517-1522(1992). 

[ 2) Otaka E., Hashimoto T., Mizuta K., Suzuki K 

Protein Seq. Data Anal. 5:301-313(1993). 

£1344] 508. Ribosomal protein L20 signature 

W: ■ BteMM 120. . pJ!2C£S —*0»-l»*« sWMie , <„ 

- Cyanelle L20. 

Sirr,^ 9 ^~ esw '' ,s A - ■— to,M " - — — - — 
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[1336] 501. Ribosomal protein L17 signature 

Ribosomal protein L17 is one of the proteins from the large ribosomal subunit. L17 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities, groups: - Eubacterial L17. 

Yeast mitochondrial YrnLS (gene MRPLB). 

E ubacterial L1 7 is a protein of 1 20 to 1 30 amino-acid residues. Yeast YmL8 is twice larger (238 residues), the sequence 
of its N-terminal half is colinear with that of eubacterial L1 7. As a signature pattern, a conserved region in the N-terminal 
section was selected. 

- Consensus pattern: l-x-[ST]-[GT]-x(2HKR]-x-K-x(6H^ 
[1337] 502. Ribosomal protein L18e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

Vertebrate L18 (known as L14 in Xenopus) [1]. - Plant L18. 

- Yeast L18 (Rp28). - Halobacterium marismortui H129. 
Sulfolobus acidocaldarius H129e. 

These proteins have 115 to 187 amino-acid residues., A stretch of about 1 3 residues in the first third of these proteins 
has been selected as a signature pattern. 

- Consensus pattern: [KRE]-x-L-x(2)-[PS}4KR]-x(2)-[RH]-[PSA]-x-[LIVM]-INS]-[LIVM]-x-[RK]-[LIVM] 

[ 1 j Puder M., Barnard G.R, Staniunas R.J., Steele G.D. Jr., Chen LB. 
Biochim. Biophys. Acta 1216:1 34-1 36(1 993). 
[1 338] 503. Ribosomal L1 8p family 

It has been shown that the amino terminal 93 amino acids of Swiss:P09895 are necessary and sufficient to bind 5S 
rRNA in vitro. The carboxyl-terminal half of the protein, comprising amino acids 151 -296, serves to localize the protein 
to the nucleolus [1]. 
Number of members: 26 
[1] 

Medline: 96212235 

Distinct domains in ribosomal protein L5 mediate 5 S rRNA binding and nucleolar localization. 
Michael WM, Dreyfuss G; 
J Biol Chem 1996;271:11571-11574. 
[1 339] 504. Ribosomal protein L1 9 signature 

Ribosomal protein LI 9 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L19 is known to be 
located at the 30S-50S ribosomal subunit interface and may play a role in the structure and function of the aminoacyl- 
tRNA binding site. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups: - 
Eubacterial L19. 

Red algal chloroplast L1 9. - Cyanelle L1 9. 

L19 is a protein of 120 to 1 30 amino-acid residues., 

A conserved region in the C-terminal section has been selected as a signature pattern. 

- Consensus pattern: [LIVM]-x-[KRGTI]-x-[GSAl]-[KRQDA]-[VG]-[RSN)-X(0,1 HKR] [SA]-[KY]-[KLI]-[LYS]-Y-[LIM]- 
R 

[1 340] 505. Ribosomal protein L1 9e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

Mammalian ribosomal protein L19 [1 j. - Drosophila ribosomal protein L19 [2). 
Slime mold (D. discoideum) vegetative specific protein V14 [3J. 

- Yeast ribosomal protein L19 (YL14). - Archebacterial ribosomal protein L19E. 
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- Consensu, patent [KF5-Y-x(2H<-|UVMJ-R 1 STA]^KRl- G -f.|S1>L-».E 

^S^^^C^7^,^l ! r^ ~ L14 is known lo M 
■ Con-nsus p„„ : |GAHUVE(3 W9 . ro „ DN s W .x ( 4 HF V > xBHNT,.,( 2 )-V 1 UV| 

Euiaaeria LIS. . PkM dtoo^w LIS fM.JS*'** ™ ** 038,8 01 88< "~« e ■**"«•• IH 9«V^ • 

•' t^l 5 ™ 5 L,S PBraL,S - Y " SlYL,0 <-''*-'' 

• Cons.„ s „, WW. [DEHKRJ-A-R-x-L-G-|FY]-x-|SAPf-x(2)H3-|LIVMFY](4)-fl-x-R-(IV]-j(.R-G 

[ 1] Zwick! P., Lupas A, Baumeister W. 
Biochem. Biophys. Res. Commun. 209:684-668(1995). 
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It is located at the end of an alpha helix thought to be involved in RNA-binding. 
[1314] Consensus pattern: [IMhx(2HUVA]-x(2,3)-[UV™^ 
[DENSTKQ] 

[1] Nikonov S.V. t Nevskaya N., Eliseikina I.A., Fomenkova N.R, Nikulin A., Ossina N.. Garber M.. Jonsson B.-H.. 
Briand C, Al-Karadaghi S., Svensson LA., Aevarsson A., Liljas A. EM BO J. 15:1350-1359(1996). 
[ 2) Olvera J., Wool I.G. 2.3.CO:2-'Biochem. Biophys. Res. Commun. 220:954-957(19961 

[1315] 492. Ribosomal protein L1 0 signature 

Ribosomal protein L10 is one of the proteins from the large ribosomal subunit. L10 is a protein of 162 to 185 amino- 
acid residues which has only been found so far in eubacteria. A conserved region located in the NMerminal section of 
these proteins was used as a signature pattern. 

[1 316] Consensus partem: [DEH]-x(2)-IGS]-(LI VMFJ-[STlSl]-[VAJ-x-[DEQK|-{LIVMA]-x(2)-[LIM]-R 
[1317] 493. Ribosomal protein L10e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: - Vertebrate L10 (QM) [1]. - Plant L10. - Caenorhabditis elegans L10 (F10B5.1). - 
Yeast L10 (QSR1). - Methanococcus jannaschii MJ0543. These proteins have 174 to 232 aminchacid residues. A con- 
served region located in the central section was selected as a signature pattern. 
[1318] Consensus pattern: R-x-A-[FYW}-G-K-[PA]-x-G-x(2)-A-R-V 

[ 1] Chan Y.-L, Diaz J.-J., Denoroy L, Madjar J.-J., Wool LG. 2.3 CO:2-'Biochem. Biophvs. Res Commun 255* 

952-956(1996). : : 1 

[1319] 494. Ribosomal protein L11 signature 

[1320] Ribosomal protein L11 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L11 is 
known to bind directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1,2], groups: 



Eubacteria! L11. 

Plant chloroplast L11 (nuclear-encoded). 

Read algal chloroplast L11 . 

CyanelleL11. 

Archaebacterial L11. 

Mammalian L1 2. 
- Plants L1 2.. 
; Yeast L12 (YL15). 



[1321] L11 is a protein of 140 to 165 amino-acid residues. A conserved region located in the C-terminal section of 
these proteins was selected as a signature pattern. In Escherichia coli, the C-terminal half of L11 has been shown [3] 
to be in an extended and loosely folded conformation and is likely to be buried within the ribosomal structure. 
[1322] Consensus partem: [RKN]-x-[LIVM)-x-G-[ST}-x(2HSNQ]-[LIVMJ-G-x(2)-[LIVM]-x(0,1)-[DENG] 

[ 1] Pucciarelli G., Remacha M., Ballesta J.P.G.; Nucleic Acids Res. 18:4409-4416(1990). 
[ 2] Otaka E., Hashimoto T., Mizuta K., Suzuki K.; Protein Seq. Data Anal. 5:301-313(1993). 
[ 3) Choli T. Biochem. Int. 19:1323-1338(1989). 

[1323] 495. Ribosomal protein L7/L12 C-terminal domain 
[1324] [1] Leijonmarck M, Liljas A; J Mol Biol 1987;195:555-579. 
[1325] 496. Ribosomal protein L1 3 signature 

Ribosomal protein L1 3 is one of the proteins Irom the large ribosomal subunit. In Escherichia coli, L1 3 is known to be 
one of the early assembly proteins of the SOS ribosomal subunit. It bebngs to a family of ribosomal proteins which, on 
the basis of sequence similarities [1], groups: - Eubacterial L13 

Plant chloroplast L1 3 (nuclear-encoded). - Red algal chloroplast L1 3. 
- Archaebacterial L1 3. - Mammalian L1 3a (Turn P1 98). - Yeast Rp22 and Rp23. 

[1 326] L1 1 is a protein of 1 40 to 250 amino-acid residues. As a signature pattern, a conserved region was selected 

located in the C-terminal section of these proteins. 

[1327] Consensus pattern: [LIVM]-[KRV>[GK]-M-[LI\^^ 
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about 300 amino-acid L^I^S^S^ T ^ " 3 mi,oc ^rial enzyme of 

takespart in the catalytic me^^ 

asuisferaseac^^^^ 

erythraea cysA f 4J. - aZ*ccc^JS^SSi^SSi ^ iff ^ W " ^^spora 
transport of sulfur compounds Two pattern .for ?2£££L^ •! 3 per,plasrn,c P^ 8 '" P^ry involved « the 
conserved regions, one which b ta£dhto 5 ^ deV£>toped - ^ are tesed °" hi 9hry 
[1308] ConLsus ^^^^^5^5^ *" 81 *" C * m * - * e " 2 *™ 

Consensus pattern: [FYHDEAF]-G-[SA]-W-x-E-[FYW] 

[ 1] Westley J. Meth. Enzymol. 77:285-291(1981). 

[ 2] Weiland K.L, Dooley T.R Biochem. J. 275:227-231(1991) 

[ 3] Rudd K. E. Unpublished observations (1 993) 

[4] DonadbS ShafieeA., Hutchinson CA J. Bacteriol. 172:350-360(1990) 

[ 5] Laudenbach D.E, Ehrhardt D.. Green L, Grossman A.R J. Bacteriol. 173:2751-2760(1991). 

[1 309] 489. Ribonuclease III family signature 

proteins: - Fission yeast pad a ribonucle^ thTnTn^! k^" evolutionary related [2] to the following 

required for sexuaf deve£en^V ScS J^t^^^S^ ? de9radin9 3 ^ mRNA 
karyotic preribosomal RNA at various sites (Si^hJST • u ds l RNA - s P ec,f,c nucle ase that cleaves eu- 
bursara ch.orella virus 1 pJ^XK S^^^lf^T^ T? ' PmMm 

hypothetical protein SpAC8A4 08c a protein SS33^IS f "VP 0 **"^ P~te.n slr0346. - Fission yeast 

Caenorhabdftis elegans hy^^o^K^TtT? ^ 3 ° Mnal RNase 111 **«*. - 

teins share mgtor-VSE^ h P ? ^ *" 8a ™ SU UC,Ur9 38 S PAC8A4.08c.These pro- 

developed as a signlrittem ^ " * ^ SM of 9 res * ues **" "as been 

[1310] Consensus pattern: [DEQ]-fRQJ-[LMJ-E-(FYW]-[LV]-G-D-[SAR]- 

1 1J Nasbimoto H., Uchida H. Mol. Gen. Genet. 201:25-29(1985) 
[ 2] Mian I S. Nucleic Acids Res. 25:3187-31 95(1997). 

[1311] 490. Rieske iron-sulfur protein signatures 
Ubiquinol-cytochrome c reductase (EC 1 10 2 2\ faiv> i,n™», =, u . 

transport chains of mitochondria and oHoSe^ o^n, T " ,} fe 008 01 the electron 

cytochrome c. In the chloropfcst ^Zl^l^^T', ^ *" OXidoreduc,to " °< "bk,uino. and 
(also known as theb6f complexes func"SsST a S 
f. One of the components S tnes^^^ 

called the Rieske protein [1 21 The Rtesta i ™* Ul,U ' Pf ° te,n Wi,h a 2Fe " 2S cluster . is 

cluster is cornplexLo thi p « residues. The iron-sulfur 

proteins contains al, the resKfuesCbind 2TSS?SS SfS- ^ COnSe,Ved h RiBSke 

The first cysteine and the histidine are 2Fe 2S Znn! 1^ ^ ! 9 C °" ,a,n ,WO cysteines and a his «<*™- 

conserved regions were J£^J£££Z ** ** a bond |3,. Two 

Sd ? a ru^eCdT C - fTK, - H - L - G - C -' UVST J ^ ft* C and the H are 2Fe-2S ligands, (The second C is 
C« S P« 

I a 2?' F T L - M "i nhardt S W - """"M T.. Tzagoloff A. J. Mol. Biol. 205:421-435(1989) 
2 Kallas T, Spiller S.. Malkin R. Prcc. Natl. Acad. Sci. U S A 85 5794-579^7988? 
[ 3] Iwata S.. Saynovits M., Link T.A.. Michel H. Structure 4:56759(1996) ( > ' 

[1 31 3J 491 Ribosomal protein L1 signature 

• Eubaae™, LI. - «„1 and »a n, ^C^T c™!*-, 0 " TJH", I 1 - 21 ^ 
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sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebac- 
teria, there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides. A component of 14 to 18 Kd shared by all three forms of eukaryotic RNA polymerases and which has 
been sequenced in budding yeast (gene RPB6 orRP026), in fission yeast (gene rpb6 or rpol 5), in human and in African 
swine fever virus [1] is evolutionary related [2} to archaebacterial subunit K (gene rpoK). The archaebacterial protein 
is colinear with the C-terminal part of the eukaryotic subunrt. 
[1 299] Consensus pattern: [ST>x-[FY]-E-x-[ AT]-R-x-[Ll VM]-{GSA]-x-R-[SA]-x-Q 

[ 1] Lu Z., Kutish G.F, Sussman M.D., Rock D.L Nucleic Acids Res. 21:2940-2940(1993). 
[ 2] McKune K., Woychik N.A. J. Bacteriol. 176:4754-4756(1994). 

[1 300] 483. RNA polymerases L / 1 3 to 1 6 Kd subunits signature 

In eukaryotes. there are three different forms of DNA-dependent RNApolymerases (EC 2.7.7.6) transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebac- 
teria, there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides. It has been shown that small subunits of about 1 3 to 1 6 Kd found in all three types of eukaryotic polymer- 
ases are highly conserved. Subunits known to belong to this family are: - Budding yeast RPC19 subunit from RNA 
polymerases I and III [1J. - Budding yeast RPB11 subunit from RNA polymerase II [2]. - Mammalian RPB11 (gene 
POLR2K) from RNA polymerase II. - Caenorhabditis elegans hypothetical protein F58A4.9. - Methanococcus jannaschii 
RNA polymerase subunrt L (gene rpoL). - Sulfolobus acktocaldarius RNA polymerase subunit L (gene rpoL) [3].As a 
signature pattern a conserved region was selected which is located at the N-terminal extremity of these polymerase 
subunits: this region contains two cysteines that could play a role in the binding of a metal ion. 
[1 301] Consensus pattern: [DE](2)-H-[ST]-[Li VM]-[GAP}-N-x(1 1 )- V-x-(FM]-x(2)-Y-x(3)- H-P 

[ 1) Dequard-Chablat M., Riva M., Carles C, Sentenac A. J. Biol. Chem. 266:15300-15307(1991). 
[ 2] Woychik N.A., McKune K., Lane W.S., Young RA Gene Expr 3:77^82(1993). 
[ 3] Langer D. EMBLVGenBank: X70805. 

[1302] 484. RNA polymerases N / 8 Kd subunits signature 

In eukaryotes, there are three different forms of DNAndependent RNA polymerases (EC 2.7.7.6 ) transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebac- 
teria, there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides. Archaebacterial subunit N (gene rpoN) [1] is a small protein of about 8 Kd, it is evolutionary related [2] 
to a 8.3 Kd component shared by all three forms of eukaryotic RNA polymerases (gene RPB10 in yeast and POLR2J 
in mammals) as well as to African swine fever virus protein CP80R [3].As a signature pattern a conserved region was 
selected which is located at the N-terminal extremity of these polymerase subunits; this region contains two cysteines 
that could play a role in the binding of a metal ion. 
[1 303] Consensus pattern: [LIVMF](2)-P-[LIVM]-x-C-F-[ST]-C-G- 

[ 1] Langer D., Hain J., Thuriaux P., Zillig W. Proc. Natl. Acad. Sci. U.S.A. 92:5768-5772(1995). 
[ 2) McKune K., Woychik N.A. J. Bacteriol. 176:4754-4756(1994). 

(1 3 995) neZ R J ROdri9UeZ J M " No9al M L " YuSte L ' Enrk * uez c - Rodriguez J.F., Vinuela E. Virology 208:249-278 

[1304] 485. Ribonuclease HII 

[1] Mian IS; Nucleic Acids Res 1997;25:3187-3189. 

[1305] 486. Ribonuclease PH signature 

Prokaryotic ribonuclease PH (EC 2.7.7.56 ) (RNase PH) [ 1 ] is a phosphorolyticexoribonuclease that removes nucleotide 
residues following the -CCA terminus of tRNA and adds nucleotides to the ends of RNA molecules by using nucleoside 
diphosphates as substrates. RNase PH is a conserved protein of about 240 amino-acid residues. It is evolutionary 
related to Caenorhabditis elegans hypothetical protein B0564.1.As a signature pattern, the most highly conserved 
region was selected which is located in the central part of these proteins. 

Consensus sequence: C-[DEHLIVM](2)-Q-[GTA]-D-G-tSGJ-x(2)-[TA]-A [ 1] Kelly K.O., Deutscher M P J Biol Chem 

267:17153-17158(1992). 

[1306] 487. RanBPl domain 

(1] Di Matteo G, Fuschi P, Zerfass K, Moretti S, Ricordy R, Cenciarelli C, Tripodi M, Jansen-Durr P Lavia P Cell Growth 

Differ 1995;6:1213-1224. 

[1307] 488. Rhodanese signatures 
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[4J. Th« toxin i Lposi" S^JS^^^ZT r ^ ^ ^ Shi9elb 
for binding the tox^Smplex to sS 

Escheriche coli toxins Jy similar in S^^^S^^T? -rf*"* 9 ,0XinS (SLT) are a 9 rou P <* 
toxins. SLT-1 [5]andSLT-2[6] is l^ow S 

sylated chains' Led by a d E ffid TbTd T^e A S 1^^.^°; "t" - * "* ^ * «"° *«~ 
preference for galactoses Both chafcs are * The B chain is a lectin with a binding 

II ribosome-inaS^ 

from the seeds of the bean Abrus pecatorius ETSS^JS 9 *" ^ "° m Ca " tor bean ' ^ abrin 

Exarnp.esofsuchproteinsare^^^ 

gourd luffin-A and -B, garden fbur-cftbck M^^^^r^loT^ ^^^^^^ s P°"° e 
a-structura.*^^^ 

near a conserved arginine which also plays a role in ^12 roT PJ !. ^ mecnan ' s ">; « * located 

proteins includes these cata^ic rSues * W 7,19 S,9natUr9 ** been devel °P ed *» 

M.R.. Minami Y., Sung-Sil K., S M^rZe^l ^^SSffi?!/ , k R, T ,a ' 
M., Holmes R.K.. 0>Brien A.D. J. Bactertol 1701116-1 Sia» H riliSS^. ' ^ * P " Sung L 
Keusch G.T.. Mekalanos J.J. Prcc Natl Acad Sci USA IfHHL^T^ &B * AUClair F - ^^e-Rolfe A.. 
A.D., Holmes R.K.. New.and J.W FeSs £££ Lett £ loimS m ^1' ^ N9i " ^ ^ 

chim. Biophys. Acta 1154 237-282(19931 f 81 Hovril r , r « JL J'i 1 BUbma L> Battelli M G - Stir P e F - B«c- 
Acad. Sci U.S. A 85 2568*5 7 Xh 9 E a p C^*™** SB - Mokatano. J.J.. Collier RJ. Proc. Natl. 
Biol. 235:705-715(1993) ( ) I ' 9 ° ^ EJ - EmSt SR ' ,rvin JD - J.D. J. MoL 

[1294] 479. Bacterial RNA polymerase, alpha chain (RNA pol A bac) 

domains. The amino terminal doTint^^^^^ 
carooxyMerminaldc^inintera^^ 

consented in prokaryoticandchloroplast ^aZ^Z ^Z^I^ T*** °' ,he a ' pha SUbunit * 

.wo^the amino-terrninal and one in J^tS?^ - ^^""^ 

AK^uVS^r^ 

11295] 480. RNA polymerase beta subunit (RNA pol B) in4, 

complex contains two related members ol th s fZto TSi C , h J 0rop,as, Phrases). Each RNA polymerase 
D. Dwomtezak B, Faust DM, ~ ,ar9eSt " Fa,kenbu * 

[1296] 481 . RNA polymerases H / 23 Kd subunits signature 

In eukaryotes, there are three different forms of DNA-dependent RNA polymerases (EC ? 7 7 z\ , 
sets of genes. Each class of RNA polymerase « an acJLoT . . p ° vmeraSBS < EC 2£L§) transcribing different 
teria, there is generally a S LXtoTo?mA ^^ ^ 1 * **** d '" erenl ^P'^s. In archaebac- 
polypeptides. ArchaebUTs ^ 2i C °T °' ——9. of 10 to 1 3 

related to the C-terminal pari of a 23 <d SSJE^i 1 1 P * about 8 5 to1 ° Kd " il is ^olutionary 

RPB5 fa yeas, and PoSS i T °' RNA P 0 *""— 

theN-terminal extremity of subunT^SoL ^,1 ? P TV COnserved ^ «» selected which is located at 
[1297, Consensus 

! iiST k . K u - Palm P - L °" SPeiCh F - Z " ,i9 W P,OC Na « Sci U S A 89 407-410(1992^ 

[2]Th,ruA..HodachM.. E ,oran teJ .a,Kosto Ur ouV..Weinzier,Ra 

[1298] 482. RNA polymerases K / 14 to 18 Kd subunits signature 

-n euka.yotes, there are three different forms o, Dependent RNApoTymerases (EC 27^6, tMlg dWferem 
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1 IRpt. 2 IRpt 3 IRpt. 4 IRpt. 5 IRpt 6 IRpt. 7 I C-terminal I + — i 1 h H H 

^ K In Drosophila two signature patterns for RCC1 were developed. The first is found in the N-terminal part 

of the second repeat; this is the most conserved pari of RCC1 . The second is derived from conserved positrons in the 

C-terminal part of each repeat and detects up to five copies of the repeated domain. The RCC1-type of repeat is also 

found in the X-linked retinitis pigmentosa GTPase regulator [3]. 

[1 282] Consensus pattern: G-x-N-D-x(2)-[AV]-L-G-R-x-T- 

Consensus pattern: [LIVMFAHSTAGCJ(2)-G-x(2>-H-[STAGLIHLIVMFA]-x-[UVM]- 

[ 1) Dasso M. Trends Biochem. Sci. 18:96-101(1993). 

[ 2J Boguski M.S., McCormick F. Nature 366:643-654(1993). 

[ 3] Roepman R., Van Duijnhoven G., Rosenberg T., Pinckers A.J.LG., Bleeker-Wagemakers LM. ( Bergen A.A. 
B., Post J., Beck A.. Reinhardt R., Ropers H.-H., Cremers F., Berger W. Hum. Mol. Genet. 5:1035-1041(1996). 

[1283] 474. RNA 3*-terminal phosphate cyclase signature (RCT) 

RNA ^-terminal phosphate cyclase (EC 6.5.1.4 ) [1,2] catalyzes the conversion of 3*-phosphate to a ^-cyclic phos- 
phodiester at the end of RNA. The biological role of this enzyme is unknown but it is likely to function in some aspects 
of cellular RNA processing. The reaction catalyzed by the enzyme occurs in three steps: 1 ) adenylate of the enzyme 
by ATP; 2) the enzyme acts on RNA-3'terminal phosphate to produce RNA-31erminal diphosphate adenylate; 3) Re- 
lease of AMP and cyclisation by a non catalytic nucleophilic attack by the adjacent 2*hydroxyl on the phosphorus in 
the diester linkage. This enzyme, which has been characterized in human (where there seems to be at least three 
isozymes) and Escherichia coli (gene rtCA), seems to be taxonomically widespread. It is found in insects, plants, fungi 
(gene RTC1 inyeast) and in archeabacteria. RNA cyclase is a protein of from 36 to 42 Kd. The best conserved region, 
which is used as a signature pattern, is a glycine-rich stretch of residues located in the central part of the sequence 
and which is reminiscent of various ATP, GTPor AMP glycine-rich loops. In this context, the conserved Arg (His in the 
E.coli enzyme) could be the AMP-binding residue. 
[1284] Consensus pattern: [RH)-G-x(2)-P-x-G(3)-x-[LIV]- 

[ 1] Genschik P.. Billy E., Swianiewicz M.. Filipowicz W. EMBO J. 16:2955-2967(1997). 
[ 2] Filipowicz W., Vincente O. Meth. Enzymol. 181:499-510(1990). 

[1285] 475. REV protein (anti-repression trans-activator protein) 

[1286] 476. Prokaryotic-type class I peptide chain release factors signature (RF-1) 

Peptide chain release factors (RFs) are required for the termination of protein biosynthesis [1]. At present two classes 
of RFs can be distinguished Class I RFs bind to ribosomes that have encountered a stop codon at their decoding site 
and induce release of the nascent polypeptide. Class II RFs are GTP-binding proteins that interact with class I RFs 
and enhance class I RF activity. In prokaryotes there are two class I RFs that act in a codon specific manner[2]: RF-1 
(gene prf A) mediates UAA and U AG-dependent termination while RF-2(gene prfB) mediates UAA and UGA-dependent 
termination. RF-1 and RF-2 are structurally and evolutionary related proteins which have been shown [3] to make up 
a family that also contains the following proteins: - Fungal MRF1, a mitochondrial RF (m-RF) which recognizes the 
UAA and UAG codons. - Escherichia coli RF-H, a protein of unknown function. - Escherichia coli hypothetical protein 
yaeJ and a close Pseudomonas putida homolog. A highly conserved region located in the central part of the 40 to 45 
Kd RF-1/2 and m-RF and in the N-terminal of the 15 to 16Kd RF-H and yaeJ is used as a signature pattern. 
[1287] Consensus pattern: [AR]-[STA]-x-G-x-G-G-Q-[HNGCS]-V-N-x(3)-[ST]-A-[IV| 

Note that prokaryotic-type class I RFs display no significant sequence similarity to prokaryotic-type class II which belong 
to the family of GTP-binding elongation factors nor to eukaryotic class I or class II RFs. 

[ 1) Tate W.P, Poole E.S.. Mannering S.M. Prog. Nucleic Acids. Res. Mol. Biol. 52:293-335(1996). 
[ 2] Craigen W. J., Lee C.C., Caskey CT. Mol. Microbiol. 4:861-865(1 990). 
[ 3] Pel H.J., Rep M.. Grivell LA. Nucleic Acids Res. 20:4423-4428(1992). 

[1288] 477. RIO1/ZK632.3/MJ0444 family signature 

The following uncharacterized proteins are evolutionary related [1]: - Yeast protein RI01. - Caenorhabditis elegans 
hypothetical protein ZK632.3. - Methanococcus jannaschii hypothetical protein MJ0444. - Thermoplasma acidophilum 
hypothetical protein if rpoA2 3*region.The eukaryotic members of this family are proteins of about 55 to 60 Kd. while 
the archebacterial ones are half that size. The central part of these proteins is highly conserved. The best conserved 
region is used as a signature pattern. 

[1289] Consensus pattern: [LIVM]-V-H-[GA]-D-L-S-E-[FYJ-N-x-lLIVMJ 
[1290] [ 1] Bairoch A. Unpublished observations (1997). 
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[ 1) Rivett A.J. Biochem. J. 291:1-10(1993). 

[ 2] Rivett A.J. Arch. Biochem. Biophys. 268:1-8(1989). 

[ 3J Goldberg A.L., Rock K.L Nature 357:375-379(1992) 

[ 4] Wilk S. Enzyme Protein 47:1 87-188(1 993). 

[ 5) Hilt W. ( Wolf D.H. Trends Biochem. Sci. 21:96-102(1996) 

[ 6J Rawlings N.D., Barrett A. J. Meth. Enzymol. 244:19-61(1994). 

[1277] 471. (pyr redox) Pyridine nucleotide-disulphide oxidoreductases c!ass-l active site 

sequence and structural simSes J1 freTeST, « \T "*"** * ,he SMe - °" *• »«*• <* 
together the following enzyme! Sto^S 

reductase (EC 1.6 4.5V - Trypanothione reductas^EC^ m f ( f " " euka, y° tes thioredoxin 

rr ent el *5S^^ tne E3 

the two cysteines involved in the redox-active disuffid« h™rt 1 I 1 1 > Tne sequence around 

[1278] Consensus pattern: Q^SSJSSS?? Xt?^! S^f" ^ 38 3 Signa,Ure pattem - 

P^it^sSandTc^thepanemallknc^sZ^ ^ * e Ste dtau " i * 
and torn cyanotactorfa^ 

[ 1J Kurtyan J., Krishna T.S.R., Wong L, Guenther B.. Pahler A., Williams C.H. Jr., Mode. P. Nature 352:172-174 

I S BmlnNi' ?*"!? o** Qm * AR * BioL 174:483-496(1984). 
| 3J Brown N.L. Trends Biochem. Sci. 10 400-402(1985) 

! sj ^SS^i^u'ST ' AfCh - Bi0Chem - BfophyS - 268:409-425(1989). 

« S D v^ M " N3deaU K TrSnds Biochem - 16:305-309(1991). 

6 Gasdaska P.Y.. Gasdaska J R.. Cochran S.. Powis G. FEBS Lett. 373 5-9(19951 
[ 7J Cre,ssen G., Edwards EA. Enard C, Wellbum A.. Mullineaux p £ J SS-131 (1991). 

[1.2.3,4], especiallylnmevicinityonhe ^ re9ions * similarity 

(PLP) group. These enzymes are: - G.utar^^^^ 

glutamate into the neurotransmitter GABA U JZ! , \ { J~ 1§) (GAD) - C^zes the decarboxylation of 
ryzes the decarboxy.atbn of hTs 2n^^^ (EC41JJ2) ( hZ). Cata- 

PLPasacofactor (found in Gram-ne^trve SeTaa^dTm^ ^^r 6 ^ 6 " ^ °' HDC: ,hose ,hat use 

(TyrDC) which converts tyrosineintotvramine , a >" " yf ° S,nB decarbox ylase (EC 4.1.1.25 1 
are collectively known asg^ 

[ 1] Jackson F.R. J. Mol. Evol. 31 325-329(1990) 

IKSSSJS^^ ^ C - FenS,errnaChef BehrendSe " ME - Proc. 

! 2 f^rKr E - ^ Tl • ° hriS,en R Eur J - Biochem 221:997-1002(1994) 

[ 4) >sh„ S„ M.zug.hi H., Nishino J.. Hayashi H.. Kagamiyama H. J. BicLem ,20:369-376(1996). 

[1281] 473. Regulator of chromosome condensation (RCC1) signatures (RCC11 

role in ,he regu.ation of gene expression. RCC . known as WP20 o ^RM^n 1 T T I****'**" an important 
Drosophila. is a protein that contains seven tandem ^o^,; of » i!l , . V ' *"* " " SSiGn yeast and "» 
the following schematic representa.^ To repeat ma ktu n h p T 50 to 60 ™<*. As shown in 

repeat region, there is just a small N^rrnina of^/Ik , ^ °' * e lenQ,h ° f P ' otei " 0uts «* *° 

C-termina. domain of about 130 residue **** 40 '° 50 reS ' dUeS and - in *» D «*°^ P^tein only, a 

+- + + + + + + |N-t.lRpt. 
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which belong to different phyla (ranging from fungi to mammals) is tow, but the N-terminal region is relatively wen 
conserved. That region is thought to be involved inthe binding to actin. The signature pattern for profilin is based on 
conserved residues at the N-terminal extremity .A protein structurally similar to profilin is present in the genome of 
variola and vaccinia viruses (gene A42R). 

[1268] Consensus pattern: <x(0,1 )-[STA]-x(0,1 )-W-[DENQH}-x-[YI]-x-[DEO] 

[ 1} Haarer B.K., Brown S.S. Cell Motil. Cytoskeleton 17:71-74(1990). 
[ 2| Sohn R.H., Goldschmidt-Clermont P. BioEssays 16:465-472(1994). 

[1269] 468. Protamine P1 signature 

Protamines are small, highly basic proteins, that substitute for histones in sperm chromatin during the haploid phase 
of spermatogenesis. They pack sperm DNA into a highly condensed, stable and inactive complex. There are two 
different types of mammalian protamine, called P1 and P2. Pt has been found in all species studied, while P2 is 
sometimes absent. There seems to be a single type of avian protamine whose sequence is closely related to that of 
mammalian P1 [1 J. As a signature for this family of proteins, a conserved region was selected at the N-terminal extremity 
of the sequence. 

[1270] Consensus pattern: [AV]-R-[NFY]-R-x(2,3)-[ST]-x-S-x-S- 

[1271] [ 1] Oliva R., Goren R., Dixon G.H. J. Biol. Chem. 264:17627-17630(1989). 

[1272] 469. Sperm histone P2 (protamine P2) 

This protein also known as protamine P2 can substitute tor histones in the chromatin of sperm. The alignment contains 
both the sequence of the mature P2 protein and its propeptide. 
[1273] 470. Proteasome A-type subunits signature 

The proteasome (or macropain) (EC 3.4.99.46 ) [1 to 5,Ey is an eukaryotic and archaebacterial multicatalytic proteinase 
complex that seems to be involved inan ATP/ubiquitin-dependent nonlysosomal proteolytic pathway In eukaryotes the 
proteasome is composed of about 28 distinct subunits which form a highly ordered ring-shaped structure (20S ring) of 
about 700 Kd. Most proteasome subunits can be classified, on the basis on sequence similarities into two groups, A 
and B. Subunits that belong to the A-type group are proteins of from 210 to 290 amino acids that share a number of 
conserved sequence regions. Subunits that are known to belong to this family are listed below. - Vertebrate subunits 
C2 (nu), C3, C8. C9, iota and zeta. - Drosophila PROS-25, PROS-28.1, PROS-29 and PROS-35. - Yeast C1 (PRS1) 
C5 (PRS3), C7-alpha (YS) (PRS2), Y7, Y1 3, PRES. PRE6 and PUP2. - Arabidopsis thaliana subunits alpha and PSM3o! 
- Thermoplasma acidophilum alpha-subunit. In this archaebacteria the proteasome is composed of only two different 
subunits.As a signature pattern for proteasome A-type subunits the best conserved region was selected, which is 
located in the N-terminal part of these proteins. 

[1274] Consensus pattern: [FY]-x(4)-[STNVJ-x-[FYW}-S-P-x-G-[RKHhx(2)-Q-[LIVM]-[DE]- Y-[SAD]-x(2)-[SAG]-. 
These proteins belong to family T1 in the classification of peptidases [6.E2]. 

[ 1] Riven AJ. Biochem. J. 291:1-10(1993). 

[ 2] Rivett A J. Arch. Biochem. Biophys. 268:1-8(1989). 

[ 3) Goldberg A.L., Rock K.L Nature 357:375-379(1992). 

( 4] Wilk S. Enzyme Protein 47:187-188(1993). 

[ 5] Hilt W., Wolf D.H. Trends Biochem. Sci. 21:96-102(1996). 

[ 6] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

[1275] Proteasome B-type subunits signature 

The proteasome (or macropain) (EC 34.99.46 ) [1 to 5.E1] is an eukaryotic and archaebacterial mutticatalytic proteinase 
complex that seems to be involved in an ATP/ubiquitin-dependent nonlysosomal proteolytic pathway. In eukaryotes 
the proteasome is composed of about 28 distinct subunits which form a highly ordered ring-shaped structure (20S ring) 
of about 700 Kd. Most proteasome subunits can be classified, on the basis on sequence similarities into two groups. 
A and B. Subunits that belong to the B-type group are proteins of from 190 to 290 amino acids that share a number of 
conserved sequence regions. Subunits that are known to belong to this family are listed below. - Vertebrate subunits 
C5, beta, delta, epsitan, theta (C10-II), LMP2/RING12, C13 (LMP7/RING10), C7-I and MECL-1. - Yeast PRE1 PRE2 
(PRG1), PRE3, PRE4, PRS3, PUPt and PUP3. - Drosophila L(3)73AI. - Fission yeast ptsl. - Thermoplasma' acido- 
philum beta-subunit. In this archaebacteria the proteasome is composed of only two different subunits. As a signature 
pattern for proteasome B-type subunits the best conserved region was selected, which is located in the N-terminal part 
of these proteins. 

[1276] Consensus pattern: [LIVMA]-(GSA)-[LIVMF]-x-[FYLVGAC)-x(2)-[GSACFYHLIVMSTAC](3)-[GAC]- 
[GSTACV]-[DES]-x(15)-[RK]-x(12,13)-G-x(2)-[GSTA]-D-. These proteins belong to family T1 in the classification of 
peptidases [6,E2]. 
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P'oteinsofabomTOtoSSKd.Asa^^^ 

residues and which is foca.ed h .heTeSS ^ feg,0n b riCh h aroma,ic 

[1261] Consensus pattern: (DEJ-G-S-W-x-G-x-W-[GAJ-|LIVMfx-rFYl-x-Y4GAl 

cells. It has a tendency^ aggSe v^llo S h ' ^T™ and expressed both in norma] and infected 

peptfcfe, foltowed by^n nS^M^XZ^ ***** '* * ^ ***** <* a si 9" al 
PHNPGY in chicken). i^^l^^l^Z^ " * *** ^ < PHGG GWGQin mammals, 

transtetionally removed when pTte al^oT TZZT^ ? 8 domain post- 

struc^P^ 

+ISrgl Tandem repeats I C C Sll +— n H . h — 

cysteine involved in a disulfide bond rvwitinn rrf th« * . + "* + 1 GPI 'C': conserved 

alanine- and glycine-rich reg^e r^^L^l^ 7**"'* ^ ** "* 3 
involved in the disulfide bond. ^ 38 We " 38 3 re9ion cen,ered « the second cysteine 

[1264] Consensus pattern: A-G-A-A-A-A-G-A-V-V-G-G-L-G-G Y 

Consensus pattern: B*Hm**WW^^ O-Y [C is ^ed h a disuse bond, 

[ 1] Stahl N., Prusiner S B. FASEB J. 5:2799-2807(1991) 

[ 2] Brunori U Chiara Sifvestrini M.. Pocchiari M. Trends Biochem. Sci 13 309-313(19881 
[ 3] Prusiner S.B. Annu. Rev. Microbiol. 43:345-374(1989). J13(1988). 

that accelerates protein folding by catalvzinoZ ck ,L„c (EC ^i2> < PP,ase or 'otamase). PPIase is an enzyme 
[2]. is probabtetha, CSA JSSSS^SZ SS^E^^^^^ in oligopeptides 
protein which belongs to a family [3 4 Slthat also h !? t ? " °° PP ' aSa * a cytosolic 

PP.ase which is retained in an e^^^ ^'T" B <w a 

drial matrix cyclophilin (cy P 3) - A PPIase which T ^ C y. J clopn ' l,n C - a c/loplasmrc PPiase. - Mitochon- 

protein anchored by a C^ mi^lt^ 

ninaA). - Bacterial iptaBmJJS^Zl!^ BacTeriJ r "JT CharaC,e " Zed in Dr °*°P™a (9— 
philin-refeted protean. This ^Z^nt^^K^TcZTT, ^ ' Na,u ' a '-*»°' cell ito- 

in«hefunctiono»NKcel.s.,,c^ 

pore complex protein of 358 Kd that cJL^S^m^^^ ' Ma £ ma " an nucleo P ori " N "P358 [6J. a nuclear 
YJR032W. - Fission yeast IWcSZTj^^^ 
T27D1.1.The sequences of the fc^c^ih ^S.,' Caen0rhabdl,,s «"W hypothetical protein 

ssrr region ^ in ^« p2 oniXmJr 8 are conserved - a sisna,ure ^ 

- FKBP's, a 

related to that of cyclophilin. PPIases, but the.r sequence is not at all 

[ 1) Stamnes M.A.. Rutherford S.L.. ZukerC.S. Trends Cell Biol. 2:272-276(1992) 
[ 2] F.scher G., Schmid F.X Biochemistry 29 2205-221 2(1 990) ' 

! 2 I h p °- C -; T G M ' Sa,ef M H Jf FASEB J 6 ^3420(1992) 
I 4J Galat A. Eur. J. Biochem. 216:689-707(1 993) 

( 5] Hacker J., Fischer G. Mol. Microbiol. 10 445456(1993) 

[ 6J Wu J., Matunis M.J., Kraemer D.. Blobe. G., Coutavas E. J. Biol. Chem 270 uPno.^o,^ 
f1267] 467. Profilin signature 
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[ 2] Svendsen I., Boisen S., Hejgaard J. Carisberg Res. Commun. 47:45-53(1982). 

[ 3] Nozawa H., Yamagata H., Aizono Y, Yoshikawa M.. Iwasaki T. J. Biochem. 106:1003-1008(1989). 

[ 4) Cleveland T.E., Thomburg R.W., Ryan C.A Plant MoL Biol. 8:199-207(1987). 

[ 5] Lee J.S., Brown W.E., Graham J.S., Pearce G., Fox E.A. t Dreher T.W., Ahem K.G., Pearson G.D., Ryan C.A. 
Proc. Natl. Acad ScL U.S. A 83:7277-7281(1986). 

[ 6] Seemuller U.. Eulitz M., Fritz H., Strobl A. Hoppe-Seyler's Z. Physiol. Chem. 361:1841-1846(1980). 
[ 7) Zeng F.-Y, Qian R.-Q., Wang Y. FEBS Lett. 234:35-38(1988). 

[1258] 463. (pp binding) Phosphopantetheine attachment site 

Phosphopantetheine (or pantetheine 4'phosphate) is the prosthetic group of acyl carrier proteins (ACP) in some mul- 
tienzyme complexes where it serves as a 'swinging arm' for the attachment of activated fatty acid and amino-acid 
groups [1]. Phosphopantetheine is attached to a serine residue in these proteins [2]. ACP proteins or domains have 
been found in various enzyme systems which are listed below (references are only provided for recently determined 
sequences). - Fatty acid synthetase (FAS), which catalyzes the formation of long-chain fatty acids from acetyl-CoA, 
malonyl-CoA and IMADPH. Bacterial and plant chloroplast FAS are composed of eight separate subunits which corre- 
spond to the different enzymatic activities; ACP is one of these polypeptides. Fungal FAS consists of two multifunctional 
proteins, FAS1 and FAS2; the ACP domain is located in the N-terminal section of FAS2. Vertebrate FAS consists of a 
single multifunctional enzyme; the ACP domain is located between the beta-ketoacyl reductase domain and the C- 
terminal thioesterase domain [3]. - Polyketide antibiotics synthase enzyme systems. Polyketides are secondary me- 
tabolites produced from simple fatty acids, by microorganisms and plants. ACP is one of the polypeptide components 
involved in the biosynthesis of Streptomyces polyketide antibiotics actinorhodin, curamycin, granatacin, monensin, 
oxytetracycline and tetracenomycin C. - Bacillus subtilis putative polyketide synthases pksK, pksL and pksM which 
respectively contain three, five and one ACP domains. - The multifunctional 6-methysalicyIic acid synthase (MSAS) 
from Penicillium patulum. This is a multifunctional enzyme involved in the biosynthesis of a polyketide antibiotic and 
which contains an ACP domain in the C-terminal extremity. - Multifunctional mycocerosic acid synthase (gene mas) 
from Mycobacterium bovis. - Gramicidin S synthetase I (gene grsA) from Bacillus brevis. This enzyme catalyzes the 
first step in the biosynthesis of the cyclic antibiotic gramicidin S. - Tyrocidine synthetase I (gene tycA) from Bacillus 
brevis. The reaction carried out by tycA is identical to that catalyzed by grsA - Gramicidin S synthetase II (gene grsB) 
from Bacillus brevis. This enzyme is a multifunctional protein that activates and polymerizes proline, valine, ornithine 
and leucine. GrsB contains four ACP domains. - ErythronolkJe synthase proteins 1 , 2 and 3 from Saccharopolyspora 
erythraea which is involved in the biosynthesis of the polyketide antibiotic erythromicin. Each of these proteins contain 
two ACP domains. - Conidial green pigment synthase from Aspergillus nidulans. - ACV synthetase from various fungi. 
This enzyme catalyzes the first step in the biosynthesis of penicillin and cephalosporin. It contains three ACP domains. 
- Enterobactin synthetase component F (gene entF) from Escherichia coli. This enzyme is involved in the ATP-depend- 
ent activation of serine during enterobactin (enterochelin) biosynthesis. - Cyclic peptide antibiotic surfactin synthase 
subunits 1 , 2 and 3 from Bacillus subtilis. Subunits 1 and 2 contains three related domains while subunit 3 only contains 
a single domain. - HC-toxin synthetase (gene HTS1 ) from Cochlbbolus carbonum. This enzyme synthesizes HC-toxin, 
a cyclic tetrapeptide. HTS1 contains four ACP domains. - Fungal mitochondrial ACP [9], which is part of the respiratory 
chain NADH dehydrogenase (complex I). - Rhizobium nodulation protein nodF, which probably acts as an ACP in the 
synthesis of the nodulation Nod factor fatty acyl chain.The sequence around the phosphopantetheine attachment site 
is conserved in all these proteins and can be used as a signature pattern. A profile was also developed that spans the 
complete ACP-like domain. 

[1259] Consensus pattern: [DEQGSTALMKRH]-[LIVMFYSTAC]-[GNQ]-[LIVMFYAG]-[DNEKHS]-S- [LIVMSTJ-{PC- 
FY}-[STAGCPGLIVMF]-[LIVMATNJ-[DENQGTAKRHLM]- [LIVMWSTA]-[LlVGSTACR]-x(2)-[LIVMFA] [S is the panteth- 
eine attachment site] 

[ 1] Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New- York (1988). 
J 2] Pugh E.L, Wakil S.J. J. Biol. Chem. 240:4727-4733(1965). 

[ 3] Witkowski A., Rangan VS., Randhawa Z.I., Amy CM., Smith S. Eur. J. Biochem. 198:571-579(1991). 

[ 6] Scotti C, Piatti M., Cuzzoni A., Perani P., Tognoni A., Grandi G., Galizzi A., Albertini A M Gene 130 65-71 

(1993). 

[ 9] Sackmann U., Zensen R, Rohlen D., Jahnke U., Weiss H. Eur. J. Biochem. 200:463-469(1991). 
[1260] 464. (Prenyltrans) Terpene synthases signature 

The following enzymes catalyze mechanistically related reactions which involvethe highly complex cyclic rearrange- 
ment of squalene or its 2,3 oxide: - Lanosterol synthase (EC 5.4.99.7 ' t (oxidosqualene-lanosterol cyclase), which 
catalyzes the cyclization of (S)-2,3-epoxysqualene to lanosterol, the initial precursor of cholesterol, steroid hormones 
and vitamin D in vertebrates and of ergosterol in fungi (gene ERG7). - Cycloartenol synthase (EC 5.4.99.8 ) (2,3-epox- 
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XXCCXXXXXXXXXXXCXXXXXXXXXCX)0CCXXCXXXXXCXXXXXX^ 

a disulfide bond.'*': position of the pattern * 1 conserved cysteine involved in 

! » pST E T G ZeiRUS R °- Gray W R Arch " »och«a Biophys. 238-18-29(1985) 

[2] Bohlmann H., Causen S Behnka rw« u u n ^ « . »liyo&j. 

K. EMBO J. 7:1559-1565"l988r ' """^ °~ Reima ™- phil «P trader G.. Barkhott V., Ape. 

1 3J Bohlmann H., Apel K. Mol. Gen. Genet. 207 446-454(1 987) 

[ 4] Teeter M.M., Mazer J.A.. Ultalien J.J. Biochemistry 20:5437-5443(1981). 

[1 255] 461 . Polyprenyl synthetases signatures 

uinone or coenzyme Q. .n bacteria th llnSSS^^ " \ ^ f l " C,Udin9 chole sterol. dolichol. ubiq- 
sugar carrier rgZ Among ^ «» 

zymes which catalyze a V4-cond««toi teh££ ?~Z> P * ar ° 3 " Umber 01 P 0 *™"* synthetase en- 
enzymes is known: - Euka^f 1"™* "» "W - - these 
catalyzes the sequentiai condensatim ot feSSSf (FPP (EC I EC 2.5.1.10) which 
andtLvrththlLsuT^ 

dimeric enzyme. - Prokaryotic fame^ZvSoslhl^th ^ ™ synthetase is a cytoplasmic 

synthase (gene fe pB). -pTokaryTk^ octapreny, d^osphate 

rophosphate synthetase (GGPP synthetase) (EC 5 EC 110 / EcT^" k ^' y0, ' C W^"* PV 
addition of the three molecules of IPPonto DMAPiSl V ^ E c £5i29) whch catalyzes the sequential 
is a ch.orop.ast enzyme invotv Jin ^ ^ GGPP ^thase 

this enzyme is involved in the biosynthesTof p f ' , 88 Neuros P° ra "assa (gene a.-3), 

biosynthesis of carotenokfc (geZrtE? Sue Tan en^e". s ^ ff * T**^ "** are mM m « he 
paradoxa - Eukaryotic hexapLyl wSptaSil SEL k ^ " ^ Cyane " 8 9enome « Cyanophora 

and which catalyzes the forrLSJ %XZT%££T^' T ^ " 1VO,Ved h 019 «*Wesl8 of coenzyme Q 
and 10 feoprene units depe^n he'^^^ ™ nBh » h *** 01 be ^*" « 

has been shown t 1 to 5) that a^JE^^i " 3 ^^'^ ^"rane^ssociated enzyme. ., 

are rich k. aspartic-acid esidui and ct.d KISSE^T" ? SeqUenC9 TW ° ° f mese re 9™ s 

signature patterns weredevetopedforShreq^ 
subtilis spore germination protein C^ QeS 

noid metabolism [6]. (9 9 3) ' Bolh pro,e,ns are most P robab| y al *> enzymes involved in isopre- 

[1256] Consensus pattern: |LIVMJ(2)-x-D-D-x(2.4)-D-x(4)-R.R-fGH1- 
Consensusr M ttern:[L.VMFY]-G-x(2)-[FYL]<>- [ LIVT W ]-x-D-D-[L.VMFYJ-x- t DNG] 
[ 1J Ashby NIN. Edwards P.A. J. Biol. Chem. 265:13157-13164(1990) 

[ 6] Bairoch A. Unpublished observations (1993). °'°»-o/D4pyy,>). 
[1257] 462. Potato inhibitor I family signature 

in response to mechanical damage [4 5] In ^^^^"^T■"^ to ^^ OT *^ wumu,a,8 
to note that, currently, this is the onl ! , ? " ^ ^ ^ ^ 11 fe interes «"9 

turally tt,ese inhibitors are smal. ^S^^^^^^ f inp,ant and animal ^m*. Struc' 
disutfide bonds. They have a t^m^^^J^T „ ° ,her amilies of they .ack 

serv ed in a„ members o, this f am % -ndt^I^ZS^ mC,UdeS ^ ^ °' ^ f ° Ur feSidUeS - 
SZS^t,C3^^^ —V subtilisin.hymotrypsin inhib rt or-2b has 

to.e P o t a to inhiLr,amitybrrn^ 

[ 11 Svendsen I.. Hejgaard J.. Chavan J.K. Carlsberg Res. Commun. 49:493-502(1984). 
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[1 246] Consensus pattern: [DN]-{LIV]-Y-x(3)-Y-Y-R [The second Y is the autophosphory lation site] 
[1247] [ 1] Yarden Y, Ullrich A. Annu. Rev. Biochera 57:443-478(1988). 
[1248] Receptor tyrosine kinase class III signature 

A number of growth factors stimulate mitogenesis by interacting with a family of cell surface receptors which possess 
an intrinsic, ligand-sensitive, protein tyrosine kinase activity [1]. These receptor tyrosine kinases (RTK)all share the 
same topology: an extracellular ligand-binding domain, a single transmembrane region and a cytoplasmic kinase do- 
main. However they can be classified into at least five groups. The class III RTK's are characterized by the presence 
of five to seven immunoglobulin-like domains [2] in their extracellular section. Their kinase domain differs from that of 
other RTK's by the insertion of a stretch of 70 to 100 hydrophilic residues in the middle ofthis domain. The receptors 
currently known to belong to class III are: - PlateleKJerived growth factor receptor (PDGF-R). PDGF-R exists as a 
homo- or heterodimer of two related chains: alpha and beta [3]. - Macrophage colony stimulating factor receptor (CSF- 
1-R) (also known as the fms oncogene). - Stem cell factor (mast cell growth factor) receptor (also known as the kit 
oncogene). - Vascular endothelial growth factor (VEGF) receptors Flt-1 and Flk-1/KDR [4]. - Fl cytokine receptor Flk- 
2/FR-3 [5). - The putative receptor Ftt-4 [7]. a signature pattern Was developed for this class of RTKs which is based 
on a conserved region in the kinase domain. 
[1 249] Consensus pattern: G-x-H-x-N-[LI VM]- V-N-L-L-G-A-C-T- 

[ 1] Yarden Y, Ullrich A. Annu. Rev. Biochem. 57:443-478(1988). 

[ 2] Hunkapiller T., Hood L Adv. Immunol. 44:1-63(1989). 

[ 3] Lee K.-H., Bowen-Pope D.F., Reed R.R. Mol. Cell. Biol. 10:2237-2246(1990). 

[4] Terman B.I., Dougher-Vermazen M., Carrion M.E., Dimitrov D., Armellino D.C., Gospodarowicz D., Boehlen 
P. Biochem. Biophys. Res. Commun. 187:1579-1586(1992). 

[ 5] Lyman S.D., James L., Vanden Bos T., de Vries P. , Brasel K. t Gliniak B., Hollingsworth LT. t Picha K.S., McKenna 
H.J., Splett R.R Cell 75:1157-1167(19931 

[6] Galland R, Karamysheva A., Pebusque M.J., Borg J.P., Rottapel R. ( Dubreuil P., Rosnet O., Bimbaum D 
Oncogene 8:1233-1240(1993). 

[1 250] Receptor tyrosine kinase class V signatures 

A number of growth factors stimulate mitogenesis by interacting with a familyof cell surface receptors which possess 
an intrinsic, ligand-sensitive, protein tyrosine kinase activity [1]. These receptor tyrosine kinases (RTK)all share the 
same topology: an extracellular ligand-binding domain, a single transmembrane region and a cytoplasmid kinase do- 
main. However they can be classified into at least five groups on the basis of sequence similarities. The extracellular 
domain of class V RTK's consist of a region of about 300amino acids, amongst which 16 conserved cysteines probably 
involved in disulfide bonds; this region is followed by two copies of a fibronectin typelll domain. The ligands for these 
receptors are proteins of about 200 to 300residues collectively known as Ephrins. The receptors currently known to 
belong to class V are [2.3.E1]: - EPHA1 (Eph-1; Esk). - EPHA2 (Eck; Mpk-5; Sek-2). - EPHA3 (Etk-1; Hek; Mek4; 
Tyro4; Rek4; Cek4). - EPHA4 (Sek; Hek8; Mpk-3; Cek8). - EPHA5 (Ehk-1; Hek7; Bsk; Cek7). - EPHA6 (Ehk-2). - 
EPHA7 (Ehk-3; Hek11; Mdk-1; Ebk). - EPHA8 (Eek). - EPHB1 (Eph-2; Elk; Net). - EPHB2 (Eph-3; Hek5; Drt; Erk; Nuk; 
Sek-3; CekS; Qek5). - EPHB3 (Hek-2; Mdk-5). - EPHB4 (Htk; Mdk-2; Myk-1). - EPHB5 (Cek9).The EPHA subtype 
receptors bind to GPI-anchored ephrins while the EPHB subtype receptors bind to type-l membrane ephrins. Two 
signature patterns were developed for this class of RTK's, which each include some of the conserved cysteine residues. 
[1 251] Consensus pattern: F-x-[DN J-x-[GAW]-[G A]-C-[LIVM]-[SA)-(LI VM](2)-[SA)-[LV]-[KRHQJ-[LI VA)-x(3)-[KR]-C- 
[PSAW] [The two C's are probably involved in disulfide bonds] 

Consensus pattern: C-x(2)-[DE]-G-[DEQ]-W-x(2,3)-[PAQ]-[LIVMT]-[GT].x-C-x-C- x(2)-G-[HFYHEQ] The three C's are 
probably involved in disulfide bonds] 

[ 1] Yarden Y, Ullrich A. Annu. Rev. Biochem. 57:443-478(1988). 

[ 2] Sajjadi F.G., Pasquale E.B., Subramani S. New Biol. 3:769-778(1991). 

[ 3] Wicks I P., Wilkinson D.. Salvaris E.. Boyd AW. Proc. Natl. Acad. Sci. U.S.A. 89:1611-1615(1992). 

[1252] 459. Protein kinase C terminal domain 
[1253] 460. Plant thionins signature 

Thionins are small, basic, plant proteins generally toxic to animal cells [1].They seem to exert their toxic effect at the 
level of the cell membrane but their exact function is not known. They consist of a polypeptide chain of forty five to fifty 
amino acids with three to four internal disulfide bonds. They are found in seeds but also in the cell wall of leaves [2]. 
Thionins are processed from larger precursor proteins [3J. Crambin [4], a hydrophobic plant seed protein, also belongs 
to this family. The pattern to detect this family of proteins includes three of the six cysteine residues involved in disulfide 
bonds, n »-!+ +IIM 
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first region, which is located in the NTeL^ The 
the vicinity of a lysine residue, whk* ^blT^t^^l^'^ * 9lycine " rich streteh « '^idues in 
located in the centra, par, of the ^^TZlTa^l^ ^ ^ " eond "** is 

the catalytic activity of the enzyme [sf Tw^Zt^n!,!, ^ resWue fe ^Portent for 

threonine kinases and the othTfor ^osTkE A KTLTT TV" ** 000 Speciflc ,or s ™<* 
(1 ] and covers the entire catalytic domatT ^ A Prof,, ° ^ S ato * veto P ed ^ fe ^sed on the alignment in 

the class detected bytto pattern KTSJl^ilS^,^ ^ °' kno "" n protein beteng to 

in Mi region and are comply 'miss* S^^ rt ^"^^^^««MI.«^ 

""ase^andalsoEpstein-ferrvi™^ 

conserved Lys and which are therefore detectS SSlL^ £ I respectrvely Ser and Arg instead of the 

H244] Consensus pattern- ttJ^fS^^S^S^^ SpeCrfiC paMem described below - 

idue] ALL tyrosine specific or^ 7u£ ^^"^ W^W^VMPYq^ [D is an active site res- 

detected by the pattern. Inl pam^^^^J^'T ^ " d m ° US8 Wk to ,his cla « 
herpesviruses ganciclovir kinases rim ^ o T ^ amno *«'«kto phosphotransferases [8 9] and 
This profile also delects rece^:ZZ^TJ^ T*™ 1 * «» «*- to protein kinases 

tween these rmte^andL^^^^^^^ ribonuc '^es. Sequence similarities be- 
thatena kinase- like protein TMKL1^^^t^^^^ brtw - lta,sode ^,s Arabidopsis 
two protein kinase signatures, the pSSX^t^a^T 1 * 0 ^ " 3 ^ a " a,yZed *° 

f 1] Hanks S.K.. Hunter T. FASEB J. 9:576-596(1995) 

[ 2J Hunter T. Meth. EnzymoL 200:3-37(1 991 ) 

f 3] Hanks S.K.. Quinn A.M. Meth. Enzymol. 20ft 38-62(1 991) 

[ 4J Hanks S.K. Curr. Opin. Struct Biol. 1 :369-383(1 991 ) 

1 2 K* ni Z S ^' Quinn Hun,e ' T. Science 241:42-52(1988) 

SSSSir ^ ^ ^ LF - " ^ Xuong N.-H.. Tay,or S.S.. Sowadski J M . Science 253: 
[ 7] Bairoch A., Claverie J.-M. Nature 331 22(1988) 
[ 8] Benner S. Nature 329:21 -21 (1 987). 
[ 9] Kirby R. J. Mol. Evoi. 30:489-492(1992) 

!IS M tt,er ^ StU3rt A ° ' Ch6e M S - Nature 358:160-162(1992) 
[11] Munoz-Dorado J., Inouye S., Inouye M. Cell 67:995.100^ 1001) 

[1245] Receptor tyrosine kinase class II signature 
an^SS^^^ 

same topology: an extracelMar ligan LZ a *™ eS (HTK)att Share *. 
main. However they can be dashed into J^e^J^-^T^ 'I 9 ™ 3 Wnase do- 

a heterotetramer of two alpha and two be^a cSnl nZ ^' T^T**" Cl3SS " mKs fe the insuli " «*eptor. 
products of a precursor molecule. ^t^^c^TZ^?^ 1** and bete Chains are clea 
membrane and contains the tyrosine prX kirase Zain £ ST , 66,3 chain ^sverses the 

- Insulin receptor from vertebrates. - Z2a^ ^ ^'^ mS am8n * ,0 bel °"9 to cfess .. are: 
(IRR). which is most probably a fecep^ZT^Zl^ZZT TT'V ' nSU ' in W-'ated receptor 
Molluscan insulin-related peptides) recepVor^fp m .ntTrl ^ ' lnSeCts insulin - |ike «**Pt°rs. - 

- The Drosophila devetopmen.a. pi^SS^ 2££ Tf Branchfos <°™ 'ancelun, 
mation of U,e R7 photoreceptor cl. - Th^mJ i^S^rj^'SSSf" IT" " *" 
receptors for nerve growth factor and related neurotrophic factors f BDNF a „H It , T^l 1 ^ 3re high afflni,v 
receptors: - ROS. - LTK (TYK1). - EDDR1 (cak THwfSreS^iSo^ } ,0llOWin9 characterized 
tyroshe kinase. Whi.e onfy me *nm*^^^££" ^ TKT >- ' A ^ P**» receptor 
conformation specific to class II RTK's aH he Se Sns B t J ' WS T ^ * eXist in ,he ,e,ram ^ 
especialV around me putative site of amcXpho^ ** kinase d «™'". 
RT^s. which incudes the tyrosine reskj, S^^^SX^ "* *" °' 
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[ 1J Dawson J.H. Science 240:433-439(1988). 

{ 2] Kimura S., Ikeda-Saito M. Proteins 3:113-120(1988). 

[ 3] Henrissat B., Satoheimo M., Lavaitte S., Knowles J.K.C. Proteins 8:251-257(1990). 
[ 4) Welinder K.G. Biochim. Biophys. Acta 1080:215-220(1991). 

[1236] 455. pfkB family of carbohydrate kinases signatures 

It has been shown [1.2,3] that the following carbohydrate and purine kinasesare evolutionary related and can be 
grouped into a single family, which isknown [1] as the *pf kB family: - Fructokinase (EC 2.7.1.4 ) (gene scrK). - 6-phos- 
phofructokinase isozyme 2 (EC 2.7.1.11) (phosphofructokinase-2) (gene pfkB). pfkB is a minor phosphofructokinase 
isozyme in Escherichia coli and is not evolutionary related to the major isozyme (gene pf kA). Plants 6-phosphofruc- 
tokinase also belong to this family. - Ribokinase (EC 2.7.1.15) (gene rbsK). - Adenosine kinase (EC 2.7.1.20 ) (gene 
ADK). - 2Ktehydro-3Hdeoxyg!uconokinase (EC 2.7.1 .45 ) (gene: kdgK). - 1 -phosphofructokinase (EC 2.7. 1.56) (fructose 
1 -phosphate kinase) (gene fruK). - Inosine-guanosine kinase (EC 2.7. 1.73 ) (gene gsk). - Tagatose-6-phosphate kinase 
(EC 2.7.1.144) (phosphotagatokinase) (gene lacC). - Escherichia coli hypothetical protein yeiC. - Escherichia coli hy- 
pothetical protein yeil. - Escherichia coli hypothetical protein yhfQ. - Escherichia coli hypothetical protein yih V - Bacillus 
subtilis hypothetical protein yxdC. - Yeast hypothetical protein YJR105w.AH the above kinases are proteins of from 280 
to 430 amino acid residues that share a few region of sequence similarity. Two of these regions were selected as 
signature patterns. The first pattern is based on a region rich in glycine which is located in the N-terminal section of 
these enzymes; while the second pattern is based on a conserved region in the C-terminal section. 
[1 237] Consensus pattern: [AG]-G-x(0, 1 HGAP]-x-N-x-[STAJ-x(6)-[GS]-x(9)-G- 
Consensus pattern: [DNSK]-[PST^^-x-[SAG](2HGD^D-x(3HSAG^^-IAG]-[LIVMFYA]^LIVMSTAP] 

[ 1] Wu L-R ( Reizer A., Reizer J., Cai B., Tomtch J.M., Saier M.H. Jr. J. Bacteriol. 173:3117-3127(1991). 
[ 2] Orchard L.M.D., Kornberg H.L Proc. R. See. Lond., B, Biol. Sci. 242:87-90(1990). 
[ 3] Biatch G.L, Scholle R.R., Woods D.R. Gene 95:17-23(1990). 

[1238] 456. Phospholipase A2 active sites signatures 

Phospholipase A2 (EC 3.1,1.4 ) (PA2) [1,2] is an enzyme which releases fatty acids from the second carbon group of 
glycerol. PA2's are small and rigid proteins of 120 amino-acid residues that have four to seven disulfide bonds.PA2 
binds a calcium ion which is required for activity. The side chains of two conserved residues, a histidine and an aspartic 
acid, participate in a 'catalytic network'. Many PA2's have been sequenced from snakes, lizards, bees and mammals. 
In the latter, there are at least four forms: pancreatic, membrane-associated as well as two less characterized forms. 
The venom of most snakes contains multiple forms of PA2. Some of them are presynaptic neurotoxins which inhibit 
neuromuscular transmission by blocking acetylcholine release from the nerve termini. Two different signature patterns 
were derived for PA2's. The first is centered on the active site histidine and contains three cysteines involved in disulfide 
bonds. The second is centered on the active site aspartic acid and also contains three cysteines involved in disulfide 
bonds. 

[1239] Consensus pattern: C-C-x(2)-H-x(2)-C [H is the active site residue] This pattern will not detect some snake 
toxins homologous with PA2 but which have lost their catalytic activity as well as otoconin-22, a Xenopus protein from 
the aragonitic otoconia which is also unlikely to be enzymatically active. 

Consensus pattern: [LIVMA]-C-{LIVMFYWPCST}-C-D-x(5)-C [D is the active site residue] The majority of functional 
and non-functional PA2's. Undetected sequences are bee PA2, gila monster PA2's, PA2 PL-X from habu and PA2 PA- 
5 from mutga. 

[ 1] Davidson F.F, Dennis E.A. J. Mol. Evol. 31:228-238(1990). 

[ 2] Gomez F., Vandermeers A., Vandermeers-Piret M.-C., Herzog R., Rathe J., Stievenart M., Winand J., Chris- 
tophe J. Eur. J. Biochem. 186:23-33(1989). 

[1240] 457. Phosphorylase pyridoxal-phosphate attachment site. Phosphorylases (EC 2.4.1.1 ) [1] are important al- 
losteric enzymes in carbohydrate metabolism. They catalyze the formation of glucose 1 -phosphatef rom polyglucose 
such as glycogen, starch or maltodextrin. Enzymes from different sources differ in their regulatory mechanisms and 
their natural substrates. However, all known phosphorylases share catalytic and structural properties. They are pyri- 
doxal-phosphate dependent enzymes; the pyridoxal-P group is attached to a lysine residue around which the sequence 
is highly conserved and can be used as a signature pattern to detect this class of enzymes. 
[1241] Consensus pattern: E-A-[SC]-G-x-[GS]-x-M-K-x(2)-[LM]-N [K is the pyridoxal-P attachment site]- 
[ 1] Fukui T, Shimomura S.. Nakano K. Mol. Cell. Biochem. 42:129-144(1982). 
[1 242] 458. Protein kinases signatures and profile 

Eukaryotic protein kinases [1 to 5] are enzymes that belong to a very extensive family of proteins which share a con- 
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=aS^ a pro,y, rescue h * e ^ 

dipeptidase(genePEPD)[3]are^v^ 

thetical proteins YER078c and YF^^TJl^^ TT™, "^S 3 " 856 ™talloenzymes. Yeast hypo- 
long to L famify. As a s^nature^em ^S^fenJZ a ^Y49.29c alsote- 
histidine residues. enZymes a conserved re 9™ was selected that contains three 

[1231] Consensus pattern: [HAHGSYR H UVMT].[SG]-H-x-[UVl-G-[UVM]-x-[IV]-H-IDE]- 
[1 232] Methionine aminopeptidase signatures (pep2) 

and uncharged. All MAP stE toSt^™ prokaryotc proteins if the penultimate amino acid is small 
of MAP en^mes areknotlfeSs?? ^ 2 f W-Sr T*** "** iot aclivi * T «° ^WamHies 

quence scanty mostly cJer^ao^Z^Z 9 f** "* 8hare a limited «™<* <* ~ 

binding. The J f***?* ooB ^ ' 3 J' to to h octal- 

is made up of abacterial MP^^^mI^T ^ wWlB second 9™P 

seem to be MAP, but that are SS^S^^^^ - ^,- 1 " **— whteh do not 

yeast curved DNA-binding protein For elcToUhtllnrf^ ^ s mouse P^^^sociated protein 1 and fission 
includes rescues Known to be invoked ^^^^ * ^ — 

N are cobatt ligands] 1 n J l UNb J- e - x (3)-[DN] [The second D and the last 0/ 

^Tsl^^^l 1 - W6aVerLH - ^ MatthSWS 8 W - *~ N-L Acad. 

J 2] Keeling P.J., Doolittle W.F. Trends Biochem. Sci. 21:285-286(1996) 
1 3J Flodenck S.L., Mathews B.W. Biochemistry 32 3907-3912(1993) 
[ 4] Ftawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995) 

[1234] 454. Peroxidases signatures 

Plants, and vertebrates. In ^^Ll^ZT^Z ^ ^ diS,fibU,ed ,hr ° U9h0Ut fungi, 
iron is a histidine (Known as ^p^tS^SS^S^J 1 protoporphyrin IX and the fifth ligand of the heme 
base catafyst in the reaction b^ ZZ^'^^T. 'T anB ^ < the distel hisW ™> serves as an acid- 
residues are more or ,ess conserved iS " Z*£s^V^™ T* *~ *" ^ ^ 
regk>ns can be found are .isted betow. - Yeast ^^T^^flc^^^ 006 ° r of ««• 
(MPO). MPO is found in qranulocvtes and mrw™t- c P erox| dase (EC 1.11.1.5 ). - Myeloperox.dase (EC 1.11.1.7) 
system of neutrophils. - KjS T^T^y^S^RS: 'T °™°^*°<* -^=S 
agent. - Eosinophil peroxidase (EC 1 n 1 7) (irS^L ° m ^ V- J. " 30,8 85 an antir "«robial 

Thyroid peroxidase (EC 1.11 1 B) (TOtoS^TT-^k" Cyt0p,asmic 9 ranu,es °' eosinophils. - 
the iodination and couplir^S ^S^SC^t T Sf ^ nthesis °' th *' ° W hormones. I, cJLyzes 
Fungal ligninases. Ligninase J^^uZ^Zl h to >" e,d ,he "ormones T3 and T4. - 

. h eC(a, P ha^ 

ofrZa^relS 

others are involved in *b nj££Z2£^ZZ£ tOWard -*9. 

Con,«™ s pa „ em: |SGATV,-x(3,-| U »MA W U»« h x 1 FW,.H., 1 SAC] [H » „ «*. 
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1 587-1 643(1 995).[ 3] Shi G.-P, Chapman H.A., Bhairi S.M., Deleeuw C, Reddy V.Y., Weiss S.J. FEBS Lett. 357* 
129-134(1995).[4] Velasco G., Ferrando AA, Puente XS. f Sanchez LM.. Lopez-Otin C. J. Biol. Chem 269^ 
271 36-27142(1 994).[ 5] Chapot-Chartier M.P., Nardi M., Chopin M.C., Chopin A., Gripon J.C. AppL Environ. Microbiol. 
59:330-333(1 993).[ 6] Higgins D.G., McConnell D.J., Sharp P.M. Nature 340:604-604(1 989).[ 7] Ftawlings N D Barrett 
A J. Meth. Enzymol. 244:461-486(1994). 

[1220] 450. (peptidase M24) Aminopeptidase P and proline dipeptidase signature (1). 

Aminopeptidase P (EC 3.4. 11. 9) is the enzyme responsible lor the release of any N-terminal amino acid adjacent to a 
proline residue. Proline dipeptidase(EC 3.4.13.91 (prolidase) splits dipeptides with a prolyl residue in the carboxyl 
terminal position. Bacterial aminopeptidase P II (gene pepP) [1 ], proline dipeptidase (gene pepQ)[2], and human proline 
dipeptidase (gene PEPD) [3] are evolutionary related. These proteins are manganese metalloenzymes. Yeast hypo- 
thetical proteins YER078c and YFR006w and Mycobacterium tuberculosis hypothetical protein MtCY49.29c also be- 
long to this family. As a signature pattern for these enzymes a conserved region that contains three histidine residues 
has been developed 

[1 221] Consensus pattern: [HA]-[GSYR]-[LI VMT]-[SG]-H-x-[UV]-G-[LIVM]-x-[IV]-H-[DE]- 

[1222] [ 1) Yoshimoto T., Tone H.> Honda T, Osatomi K., Kobayashi FL, Tsuru D. J. Biochem. 105:412-416(1989). 
[ 2) Nakahigashi K., Inokuchi K Nucleic Acids Res. 18:6439-6439(1 990). [ 3] Endo R, Tanoue A., Nakai H., Hata A., 
lndo Y, Titani K. , Matsuda I. J. Biol. Chem. 264:4476-4481 (1 989).[ 4] Rawlings N.D., Barrett A. J. Meth Enzvmol 248' 
183-228(1995). 

[1 223] Methionine aminopeptidase signatures. (2). Methionine aminopeptidase (EC 3.4.11.18 ) (MAP) is responsible 
for the removal of the amino-terminal (initiator) methionine from nascent eukaryotic cytosolic and cytoplasmic prokary- 
otic proteins if the penultimate amino acid is small and uncharged. All MAP studied to date are monomeric proteins 
that require cobalt ions for activity. Two subfamilies of MAP enzymes are known to exist [1 ,2). While being evolutionary 
related, they only share a limited amount of sequence similarity mostly clustered around the residues shown, in the 
Escherichia coli MAP [3],to be involved in cobalt-binding. The first family consists of enzymes from prokaryotes as well 
as eukaryoticMAP-1, while the second group is made up of archebacterial MAP and eukaryoticMAP-2. The second 
subfamily also includes proteins which do not seem to be MAR but that are clearly evolutionary related such as mouse 
proliferation-associated protein 1 and fission yeast curved DNA-binding protein. For each of these subfamilies, a spe- 
cific signature partem that includes residues known to be involved in colbalt-binding has been developed 
[1224] Consensus pattern: [MFY]-x-G-H-G-[LIVMC]-[GSH]-x(3)-H-x(4HLIVM]-x-[HN]-[YWV] [H is a cobalt ligand]- 
Consensus pattern: [DAHLIVMY]-x-K-[UVM]-D-x-G-x-[HQ]-[LIVM]-[DNS]-G-x(3)-[DN] [The second D and the last D/ 
N are cobalt ligands] 

[1225] [ 1] Arfin S.M., Kendall R.L, Hall L. Weaver LH, Stewart A.E., Matthews B.W., Bradshaw R.A. Proc. Natl. 
Acad. Sci. U.S.A. 92:771 4-7718(1 995). [ 2] Keeling P. J., Doolittle WF. Trends Biochem. Sci. 21:285-286(1 996). [ 3] 
Roderick S.L. Mathews B.W. Biochemistry 32:3907-3912(1 993).[ 4] Rawlings N.D., Barrett A.J Meth Enzvmol 248* 
183-228(1995). 

[1226] 451. Cytochrome P450 cysteine heme-iron ligand signature 

Cytochrome P450's [1 ,2,3,E1J are a group of enzymes involved in the oxidative metabolism of a high number ol natural 
compounds (such as steroids, fatty acids, prostaglandins, leukotrienes, etc) as well as drugs, carcinogens and muta- 
gens. Based on sequence similarities, P450's have been classified into about forty different families [4,5]. P450's are 
proteins of 400 to 530 amino acids; the only exception is Bacillus BM-3 (CYP102) which is a protein of 1048residues 
that contains a N-terminal P450 domain followed by a reductase domain. P450's are heme proteins. A conserved 
cysteine residue in the C-terminal part of P450*s is involved in binding the heme iron in the fifth coordination site. From 
a region around this residue, a ten residue signature was developed specific to P450's. 
[1 227] Consensus pattern: [FW]-[SGNH]-x-[GD]-x-[RHPT]-x-C-[LIVMFAP]-[GAD] [C is the heme iron ligand]- 

[ 1] Nebert D.W., Gonzalez RJ. Annu. Rev. Biochem. 56:945-993(1987). 

[ 2J Coon M.J., Ding X., Pernecky S.J.. Vaz A.D.N. FASEB J. 6:669-673(1992). 

[ 3] Guengerich RP. J. Biol. Chem. 266:10019-10022(1991). 

[ 4) Nelson D.R., Kamataki T, Waxman D.J., Guengerich RP, Estrabrook R.W., Feyereisen R., Gonzalez RJ., 
Coon M.J., Gunsalus I.C.. Gotoh O., Okuda K., Nebert D.W. DNA Cell Biol. 12:1-51(1993). 
[ 5] Degtyarenko K.N., Archakov A.L FEBS Lett. 332:1-8(1993). 

[1228] 452. (Pec Lyase) Pectate lyase 

This enzyme forms a right handed beta helix structure. Pectate lyase is an enzyme involved in the maceration and soft 
rotting of plant tissue. 

[1229] [1] Yoder MD, Keen NT, Jumak F, Science 1993;260:1503-1507. 

[1230] 453. (pep M24) Aminopeptidase P and proline dipeptidase signature (pep1) 

Aminopeptidase P (EC 3.4.11,9) is the enzyme responsible for the release of any N-terminal amino acid adjacent to a 
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[1202] Note: these proteins belong to families S9A^S9B/S9C in the classification of peptidases [4J. 

[ 1) Rawlings N.D., Polgar L. Barrett A J. Biochem. J. 279:907-911(1991). 
[ 2] Barrett A. J., Rawlings N.D. 
5 [ 3) Polgar L, SzaboE. 

[ 4] Rawlings N.D., Barrett A J. Meth. EnzymoL 244:19-61(1994). 

[1203] 445. (Pterin 4a) 
Pterin 4 alpha carbinolamine dehydratase 
io [1204] Pterin 4 alpha carbinolamine dehydratase is aka DCoH (dimerisation cof actor of hepatocyte nuclear factor 
1 -alpha). 

[1205] Number of members: 11 

[1206] [1 J Cronk JD, Endrizzi JA, Alber T; Medline: 97052967 High-resolution structures of the brfuncttonal enzyme 
and transcriptional coactivator DCoH and its complex with a product analogue. - Protein Sci 1996;5:1963-1972. 
*s [1207] 446. (Pyridox oxidase) 

Pyridoxamine S'-phosphate oxidase signature 

[1208] Pyridoxamine 5'-phosphate oxidase (EC 1.4.3.5) is a FMN flavoprotein involved in the de novo synthesis of 
pyridoxine (vitamin B6) and pyridoxal phosphate. It oxidizes pyridoxamine-5-P (PMP) and pyridoxine-5-P (PNP) to 
pyridoxal-5-P The sequences of the enzyme from bacterial (genes pdxH or f pr A) [ 1 ] and fungal (gene PDX3) [2] sources 
20 show that this protein has been highly conserved throughout evolution. 

PdxH is evolutionary related [3] to one of the enzymes in the phenazine biosynthesis protein pathway, phzD (also 
known as phzG). As a signature pattern, a highly conserved region was selected located in the C-terminal part of these 
enzymes. 

25 - Consensus pattern: [LIVF]-E-F-W-[QHG]-x(4)-R-[LIVM]-H-[DNE]-R 
{ 1) Lam H.-M., WinkJer M.E. J. Bacteriol. 174:6033-6045(1992). 

[ 2] Loubbardi A., Karst R, Guilloton M., Marcireau C. J. Bacteriol. 177:1817-1823(1995). 
[ 3] Pierson LS. Ill, Gaffney T. ( Lam S. ( Gong R RE MS Microbiol. Lett. 134:299-307(1995). 

30 

[1209] 447. (Pyrophosphatase) 
Inorganic pyrophosphatase signature 

[1210] Inorganic pyrophosphatase (EC 3.6.1.1) (PPase) [1 ,2] is the enzyme responsible for the hydrolysis of pyro- 
phosphate (PPi) which is formed principally as the product of the many biosynthetic reactions that utilize ATP. All known 

35 Ppases require the presence of divalent metal cations, with magnesium conferring the highest activity Among other 
residues, a lysine has been postulated to be part or close to the active site. PPases have been sequenced from bacteria 
such as Escherichia coli (homohexamer), thermophilic bacteria PS-3 and Thermus thermophilus, from the archaebac- 
teria Thermoplasma acidophilum, from fungi (homodimer), from a plant, and from bovine retina. In yeast, a mitochon- 
drial isoform of PPase has been characterized which seems to be involved in energy production and whose activity is 

^0 stimulated by uncouplers of ATP synthesis. 

[1211] The sequences of PPases share some regions of similarities. As signature patterns a region was selected 
that contains three conserved aspartates that are involved in the binding of cations. 

- Consensus pattern: D-[SGDN)-D-[PE]-[LIVMF]-D-[LIVMGAC] 

45 

[The three D's bind divalent metal cations] 

[ 1] LahtiR., Kolakowski L.F. Jr., Heinonen J., Vihinen M., Pohjanoksa K. t Cooperman B.S. Biochim. Biophys Acta 
1038:338-345(1990). 

50 I 2) Cooperman B.S., Baykov A. A, Lahti R. Trends Biochem. Sci. 17:262-266(1992). 

[1212] 448. (Peptidase S26) 
Signal peptidases I signatures. 

[1213] Signal peptidases (SPases) [1] (aka leader peptidases) remove the signal peptides from secretory proteins. 
ss in prokaryotes three types of SPasesare known: type I (gene lepB) which is responsible for the processing of the 
majority of exported pre-proteins; type II (gene Isp) which only process lipoproteins, and a third type involved in the 
processing of pili subunits. SPase I (EC 3.4.21.89) is an integral membrane protein that is anchored in the cytoplasmic 
membrane by one (in B. subtilis) or two (in E. coli) N-terminal transmembrane domains with the main part of the protein 
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- Consensus pattern: W*™*^^ 

■ Note: in position 11 of the pattern most of these enzymes have Gly. 

[1197] i^P^CA)' 7 , Tayl ° r M Gen ° 43:287 - 293 ( 1986 )- 
Prokaryotic-type carbonic anhydrases signatures 

decc^ttic^cyanatebycyL^ 

In photosynthetic bacteria an^ P lan,^p te T? A linS' PreV6ntS ^ deP ' 8tk5n * Ce,,u,ar bicart >°<*t« [1J. 
ch.orop.as. CA are structura.lv L evISa I ^ ^^1^^^ (2J " — 1*« 

different forms of eukaryotic CA's (see <S2uJEK "* ,r0m ,he 008 groups the many 

from Haemophi.us influenzae aj ^SSS^SZ^ T «* - »" 30? 

™^-~nta^^^ 

• Consensus pattern: C4SAJ-D-S-R-[LIVM]-x-[APJ 

- Consensus pattern: [EQhY.A^IVMJ.x( 2 )- ( UVMhx(4)-[UVMF](3)-x-G-H-x(2)-C-G 

[1198] 444. (ProlyLoligopep) 

Prolyl oligopeptidase family serine active site 

•P— M) M ton tacerta (Fla^nCSiS^l^^^^^^^to 

' Z^tt'JSX^" 1 "*- ,9e ™ **» — — ~~ — - •» c 

- Dipeptidyl peptidase IV (EC 3.4. 1 4 5) (DPP \V\ dpp i v ic a « «^ *u 
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[1190] ( 1] Villalba M., Batanero E.. Lopez-Otin C, Sanchez LM., Monsatve R.I.. Gonzalez De La Pena M.A., Lahoz 
C, Rodriguez ft Eur. J. Btochem. 216:863-869(1993). 
[1 1 91] 439. Pollen allergen 

This family contains allergens lol PI, Pll and Pill from Lolium perenne. 
Number of members: 49 
[1] 

Medline: 90105394 

Complete primary structure of a Lolium perenne (perennial rye grass) pollen allergen, Lol p III: comparison with known 
Lol p I and II sequences. 
Ansari AA, Shenbagamurthi P, Marsh DG; 
Biochemistry 1989;28:8665^8670. 
[1 1 92] 440. Porphobilinogen deaminase cof actor-binding site 

Porphobilinogen deaminase (EC 4.3.1 .8), or hydroxymethylbilane synthase, is an enzyme involved in the biosynthesis 
of porphyrins and related macrocycles. It catalyzes the assembly of four porphobilinogen (PBG) units in a head to tail 
fashion to form hydroxymethylbilane. 

The enzyme covalentry binds a dipyrromethane cof actor to which the PBG subunits are added in a stepwise fashion. 
In the Escherichia coli enzyme (gene hemC), this cofactor has been shown [1] to be bound by the sulfur atom of a 
cysteine. The region around this cysteine is conserved in porphobilinogen deaminases from various prokaryotic and 
eukaryotic sources. 

- Consensus pattern: E-R-x-[LIVMFA]-x(3)-[LIVMF]-x-G-{GSA)-C-x-[IVT|-P-[LIVMF] 
-[GSA] [C is the cofactor attachment site] 

[1193] [ 1 J Miller A.D.. Hart G.J., Packman L.C., Battersby A.R Biochem. J. 254:915-918(1988). 
[1194] 441. Presenilih 

Mutations in presenilin-1 are a major cause of early onset Alzheimer's disease [2]. It has been found that presenilin-1 
(Swiss: P49768) binds to beta-catenin in vivo [4]. This family also contains SPE proteins from C.elegans. 
Number of members: 23 
[1] 

Medline: 98045995 

Presenilins and Alzheimer's disease. 

Kim TW, Tanzi RE; 

Curr Opin Neurobiol 1 997;7:683-688. 

[2]Medline: 98045995 
Presenilins and Alzheimer's disease. 
Kim TW, Tanzi RE; 

Curr Opin Neurobiol 1997;7:683-688. 

[3]Medline: 98099802 
Interaction of presenilins with the filamin family of actin-binding proteins. 
Zhang W, Han SW, McKeel DW, Goate A, Wu JY; 

J Neurosci 1998;18:914-922. 

[4]Medline: 99004850 

Destabilisation of beta-catenin by mutations in presenilin-1 potentiates neuronal apoptosis. 

Zhang Z, Hartmann H, Do VM, Abramowski D, Sturchler-Pierrat 
C, Staufenbiel M, Sommer B, van de Wetering M, Clevers H, 
Sattig P, De Strooper B, He X, Yankner BA; 

Nature 1998;395:698-702. 
[1195] 442. (Pribosyltran) Purine/pyrimidine phosphoribosyl transferases signature 

Phosphoribosyltransferases (PRT) are enzymes that catalyze the synthesis of beta-n-S'-monophosphates from phos- 
phoribosylpyrophosphate (PRPP) and an enzyme specific amine. A number of PRTs are involved in the biosynthesis 
of purine, pyrimidine, and pyridine nucleotides, or in the salvage of purines and pyrimidines. These enzymes are: 

- Adenine phosphor ibosy (transferase (EC 2.4.2.7) (APRT), which is involved in purine salvage. 

- Hypoxanthine-guanine or hypoxanthine phosphoribosyltransf erase (EC 2.4.2.8) (HGPRT or HPRT), which are 
involved in purine salvage. 

- Orotate phosphoribosyltransf erase (EC 2.4.2.10) (OPRT), which is involved in pyrimidine biosynthesis. 
Amido phosphoribosyltransferase (EC 2.4.2.14), which is involved in purine biosynthesis. 

- Xanthine-guanine phosphoribosyltransferase (EC 2.4.2.22) (XGPRT), which is involved in purine salvage. 
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central part 

- Consensus pattern: [FY]-x(2)-T-R-H-N-x-6-x(2HLIVMFAK2HDEl 

- Consensus pattern: [GSJ-x(3)-H-N-G-[UVM]-[KRMDNSKLIVMT] 

[ £SSl) M R ' 06 U ^ F M ' Galind ° J S69Ura M • G-neros G. EMBO a. 

j 2] De La Vega F.M., Galindo J.M., Old I.G.. Guameros G. Gene 169 97-100(19961 
[ 3) Ouzoums C, Bork P., Casari G., Sander C. Protein Sci. A^fi^S) ] ' 

[1187] 436. (Peptidase M17) Cytosol aminopeptidase signature 

s c iKr:s 

(EC 3.4.11.1) (LAP) but has been shovm [T]"o be ide^ 

nopepjdase is a Hexamer o, I*** cha^ ZS^&^ST"" ^ °^ "* 

i«z^^^^ 

bind manganese. mamnr.al.an enzyme are absolutely conserved in pepA where they presumably 

A^utS 
invoked in binding metali^ 

' N«»1T US Pa ? m: N - T - D " A - E G -R-L (The D and the E are zinc/manganese .igandsl 
Note, these prote,ns belong to family M17 in the classification of peptL^I E Tj 

[ 1] Matsushima M., TakahashiT., Ichinose M., Miki K. KurokawaK Takahachiu- o- k „ 

mun. 178:1459-1464(1991) *»uroKawa K., Takahashi K. Biochem. Biophys. Res. Com- 

[ 4] Rawftngs N.D.. Barrett A.J. Meth. Enzymol. 248:183-228(1995, 
[1 1 88] 437. Assemblin (Peptidase family S21 ) 
Medline: 96399137 

Three-dimensional structure of human cytomegalovirus protease 

STv i BG s,evens AM ' s,e9eman RA " S,u ™ an ej . 

Slings^" M °' ^ Ho,v,erda BC - 

Nature 1996;383:279-282. 
Number of members: 29 

[1189] 438. Pollen proteins Ole e I family signature 

The following ofcnt pdlen proteins, whose b^gica. function „ no, yet known, are structural* related [1]: 

- Olive tree polten major allergen (Ole e I) 

™*CxCxXXXXX^^ 

* * * C : conserved cysteine involved in a disulfide bond. 
'*': position of the pattern. 

- Consensus pattern: [EQJ-G-x-V-Y-C-D-T-C-R (The two CS are probably Solved in disulfide bonds, 
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Number of members: 122 

[1 1 82] 431 . (Parvo coat) Parvovirus coat protein. 72 members. 
[1183] 432. Pectinesterase signatures 

Pectinesterase (EC 3.1.1.11) (pectin methylesterase) catalyzes the hydrolysis ot pectin into pectate and methanol. In 
plants, it plays an important role in cell wall metabolism during fruit ripening. In plant bacterial pathogens such as 
Erwinia carotovora and in fungal pathogens such as Aspergillusniger, pectinesterase is involved in maceration and 
soft-rotting of plant tissue. 

Prokaryotic and eukaryotic pectinesterases share a few regions of sequence similarity [1,2.3]. two of these regions 
were selected as signature patterns. 

The first is based on a region in the N-terminal section of these enzymes; it contains a conserved tyrosine which may 
play a role in the catalytic mechanism [3]. The second pattern corresponds to the best conserved region, an octapeptide 
located in the central part of these enzymes. 

- Consensus pattern: [GSTNP]-x(6)-[^ 

- Consensus pattern: [IV]-x-G-[STADHLIVT]-D-[FYIHIV]-[FSN]-G 

[ 1) Ray J., Knapp J., Grierson D., Bird C, SchuchW. Eur. J. Biochem. 174:119-124(1988). 

[ 2) Plastow G.S. Mol. Microbiol. 2:247-254(1988). 

[ 3] Markovic O., Joemvall H. Protein ScL 1:1288-1292(1992). 

[1 1 84] 433. Pentapeptide repeats (8 copies) 

These repeats are found in many cyanobacterial proteins. 

The repeats were first identified in hglK [1]. The function of these repeats is unknown. 
The structure of this repeat has been predicted to be a beta-helix [2]. 

The repeat can be approximately described as A(D/N)LXX, where X can be any amino acid.Number of members" 75 
HI 

Medline: 96062225 

The hglK gene is required for localization of heterocyst-specific glycolipids in the cyanobacterium 
Anabaena sp. strain PCC 7120. 
Black K, Buikema WJ, Haselkorn R; 

J Bacteriol 1995;177:6440-6448. 

[2]Medline: 98318059 
Structure and distribution of pentapeptide repeats in bacteria. 
Bateman A, Murzin A, Teichmann SA; 

Protein Sci 1998;7:1477-1480. 

[3]MedIine: 98316713 

Characterisation of an Arabidopsis cDN A encoding a thylakoid lumen protein related to a novel 'pentapeptide repeat' 
family of proteins. 
Kieselbach T, Mant A, Robinson C, Schroder WP; 

FEBS Lett 1 998;428:241 -244. 
[1 1 85] 434. Polypeptide deformylase 

II] 

Medline: 97002011 

A new subclass of the zinc metalloproteases superfamily revealed by the solution structure of peptide deformylase 
Meinnel T, Blanquet S, Dardel F; 

J Mol Biol 1996;262:375-386. 

[2]Medline: 98332750 
Solution structure of nickel-peptide deformylase. 
Dardel F, Ragusa S, Lazennec C, Blanquet S, Meinnel T; 

J Mol Biol 1998;280:501-513. 
Number of members: 21 

[1 1 86] 435. Peptidy l-tRNA hydrolase signatures 

Peptidyl-tRNA hydrolase (EC 3.1.1.29) (PTH) is a bacterial enzyme that cleaves peptidyl-tRNA or N-acyl-aminoacyl- 
tRNA to yield free peptides or N-acyl-amino acids and tRNA. The natural substrate for this enzyme may be peptidyl- 
tRNA which drop off the ribosome during protein synthesis [1 ,2]. Bacterial PTH has been found [2,3] to be evolutionary 
related to yeast hypothetical protein YHR189W. 

PTH and YHR1 89w are proteins of about 200 amino acid residues. As signature patterns, two conserved regions were 
selected that each contain an histidine. The first of these regions is located in the N-terminal section, the other in the 
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- Arabidopsis thaliana peptide transporters PTR2-A anri ptro n u 

NTR1). ^ 3 PTR2 A and PTR2 ' B < a,so fcnwm as the histidine transporting protein 

- Arabidopsis thaliana proton-dependent nitrate/chlorate transporter CHL1 

- Lactococcus proton-dependent di- and tri-peptide transporter dtpT 

- caenorhabditis elegans hypothetical protein C06G8.2 

- Caenorhabditis elegans hypothetical protein F56F4.5. 

- Caenorhabditis elegans hypothetical protein K04E7 2 

- Escherichia coli hypothetical protein ybgH. 

- Escherichia coli hypothetical protein ydgR 

- Escherichia coli hypothetical protein yhiR 

- Escherichia coli hypothetical protein yjdL 

- Bacillus subtilis hypothetical protein yclR 

region, a cytoplasmic loop as^ 

the fifth traZembr^egTon transmembrane regron. The second pattern corresponds to the core of 

- Consensus pattern: FYT]-x(2HLMFY]- [ FYV)- [ L I VMFYWA>x-[IVG]-N. ( UVMAG]-G 4 6SAHLIMF] 
[ 1] Paulsen IT.. Skurray R.A. Trends Biochem. Sci. 19:404-404(1994) 

1 2] Sterner H.-Y.. Naider F.. Becker J.M. Mol. Microbiol. 16:825-834(1 995). 

RNA binding in fly Pumilio and worm FBF-1 anH frp o b^.k „ . - « 

embryonic developmentby binding s^ue^ls^aT^r^H pro, f ™ f " nct,on as translational repressors in early 
in fly Hunchback mRNA, or the p^nt^utetton dement ^Mf! 9 mR ® (e 9 " the nanos response element (NRE) 
domains are also pfeusible °"" r ^ *" ^ Puf 

domain by HMM analysis. JSN1_YEAST. for instance, appears to also contain a single FIRM 

Puf domains usually occur as a tandem repeat of 8 domains 
o^il^ 

^^^S^^^^^^ — * *NA 199,3:1421-1433. 

the domain is currently unknot ^SZZZZT ' ""^ P * T ^ Tl »^» Action of 

hSgorol'^^ 

l 9 H in t(4;14) multiple myetoma. Stee l Wfr^w JTL" ^__T™^ m V*™ »*» and is fused to 

Altherr MR, den Dunnen JT; Hum Mol ZTl99B^™a 2 ^ * Mo0rman AFM - 
[1180] 429. PX domain 

£££ fUnC,k,n ^ Ph ° X Pr ° tei - PLD **"»■ * «* isoform. 

[1] 

Medline: 97084820 

Novel oomains in NADPH oxidase subunits, sorting nexins, and 
Ktains 3-kmases: binding partners of SH3 domains'? 
Ponting CP; 
Protein Sci 1996;5:2353-2357. 
[1181] 430. ParA family ATPase 
[1] 

Medline: 91141297 

A family of ATPases involved in active partitioning of diverse bacterial olasmid. 
Motallebi-Veshareh M, Rouch DA, Thomas CM; P ' 

Mol Microbiol 1990;4:1455-1463. 
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[1 1 66] 421 . N-(5'phosphoribosyl)anthranilate (PRA) isomerase 

[1] Wilmanns M, Priestle JP, Niermann T, Jansonius JN; 

J Mol Biol 1992;223:477-507. 

[1167] 422. (PRK) Phosphoribulokinase signature 

Phosphoribulokinase (EC 2.7.1.19) (PRK) [1,2] is one of the enzymes specific to the Calvin's reductive pentose phos- 
phate cycle which is the major route by which carbon dioxide is assimilated and reduced by autotrophic organisms. 
PRK catalyzes the ATP-dependent phosphorylation of ribulose 5-phosphate into ribulose 1.5-bisphosphate which is 
the substrate for RubisCO. PRK's of diverse origins show different properties with respect to the size of the protein, 
the subunit structure, or the enzymatic regulation. However an alignment of the sequences of PRK from plants, algae] 
photosynthetic and chemoautotrophic bacteria shows that there are a few regions of sequence similarity. As a signature 
pattern one of these regions was selected. 
[1 1 68] Consensus pattern: K-[Li VM]-x-R-D-x(3)-R-G-x-[ST]-x-E 

[ 1] Kossmann J., Klintworth R., Bowien B. Gene 85:247-252(1989). 

[ 2) Gibson J.L, Chen J.-K, Tower PA., Tabita F.R. Biochemistry 29:8085-8093(1 990). 

[1 1 69] 423. (PRPP synt) Phosphoribosyl pyrophosphate synthetase signature 

Phosphoribosyl pyrophosphate synthetase (EC 2.7.6.1) (PRPP synthetase) catalyzes the formation of PRPP from ATP 
and ribose 5-phosphate. PRPP is then used in various biosynthetic pathways, as for example in the formation of purines, 
pyrimidines, histidine and tryptophan. PRPP synthetase requires inorganic phosphate and magnesium ions for its 
stability and activity. 

In mammals, three isozymes of PRPP synthetase are found; in yeast there are at least four isozymes. 
As a signature pattern for this enzyme, a very conserved region was selected that has been suggested to be involved 
in binding divalent cations [1]. This region contains two conserved aspartic acid residues as well as a histidine, which 
are all potential ligands for a cation such as magnesium. 

[1170] Consensus pattern: D-[LI]-H-[SAJ-x-0-[IMSTl-[QK4]-G-[FY]-F-x(2)-P-[LIVMFC]-D 

[1171] [ 1] Bower S.G., Harlow K.W., Switzer R.L, Hoven-Jensen B. J. Biol. Chem. 264:10287-10291(1989). 

[1172] 424. (PRTP) Herpesvirus processing and transport protein 

The members of this family are associate with capsid intermediates during packaging of the virus. 
Number of members: 31 
[1] 

Medline: 98362148 

Herpes simplex virus type 1 cleavage and packaging proteins 
UL15 and UL28 are associated with B but not C capsids during 
packaging. Yu D, Weller SK; 

J Virol 1998;72:7428-7439. 
[1173] 425. Photosystem I psaG / psaK (PSI PSAK) proteins signature 

Photosystem I (PSI) [1] is an integral membrane protein complex that uses light energy to mediate electron transfer 
from plastocyanin to ferredoxin. It is found in the chloroplasts of plants and cyanobacteria. PSI is composed of at least 
1 4 different subunits, two of which PSI-G (gene psaG) and PSI-K (gene psaK) are small hydrophobic proteins of about 
7 to 9 Kd and evolutionary related [2J. Both seem to contain two transmembrane regions. Cyanobacteria seem to 
encode only for PSI-K. 

[1174] As a signature pattern, the best-conserved region was selected which seems to correspond to the second 
transmembrane region. 

- Consensus pattern: [GT]-F-x-[LIVM]-x-[DEA]-x(2)-[GA]-x-(GTA]-[SA]-x-G-H-x-[LIVM]-[GAJ 
[1] Golbeck J.H. Biochim. Biophys. Acta 895:167-204(1987). 

[2] Kjaerulff S., Andersen B., Nielsen V.S.. Moller B.L, Okkels J.S. J. Biol. Chem. 268:18912-18916(1993). 
[1175] 426. PTR2 family proton/oligopeptide symporters signatures 

A family of eukaryotic and prokaryotic proteins that seem to be mainly involved in the intake of small peptides with the 
concomitant uptake of a proton has been recently characterized [1,2]. Proteins that belong to this family are- - Funqal 
peptide transporter PTR2. 

Mammalian intestine proton-dependent oligopeptide transporter PeptTl . 
Mammalian kidney proton-dependent oligopeptide transporter PeptT2. 
Drosophila opt I. 
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and Paramecium tetraurelia r^iaa.i. T23F11.1), Leishmania chagasi 

In Arabidopsis thaliana, the kinase associated protein Dhosohatase /kappi mi « 

(PDPCJHlv^ichcataUdephoSnoX^ 

[1159] Consensus pattern: [LIVMFY}-{LIVMFYAJ-[GSAC]-[LIVMJ-(FYC]-D-G-H-[GAVJ 

j 1] WJnk J., Trompete, H,L, Pettrich K.-G.. Cohen P.T.W.. CampbeH D.G.. Mieskes G. FEBS Lett. 297:135-138 
[ SI^V^™-' ^ H " *** Ce,L Bfol - 13:5408-6417(1993) 

[ 6] Bork P., Brown N.P. Hegyi H.. Schultz J. Protein Sci. 5: 1421-1 425(1996j ) ' 
PrcZ otlnJT? } P,Dtein P fen ^ ans,e ^s aipha subunit repeat signature 

to participate in a stable 00™,^?^!^'^^^^^^ The alpha subunit is thought 
protein prenyltransferases might share a Zl lt^ ' , ~ ! SUbUn,t b ' ntfe me pep,ide subs,rate - Distinct 
sequence motifs [1 J. These Zat! have diST^? ■ ^ ,he 3 ' pha and 66,3 subunit *°* repetitive 

Known protein prenyltransteT^tS 

- Mammalian protein farnesyltransferase alpha subunit 

- Yeast protein RAM2, a protein farnesyltransferase alpha subunit 

• Yeast protem BET4. a protein geranylgeranyltransferase alpha subunit 

S 

PrSn „!?• fT^ o r0,e ' n P^ 3135 * 2A ^9ulatory subunit PR55 signatures 

that consists of a core composed of a^Z£IS^T«TfT transduc,ion p P2A is a trimeric enzyme 

subun« A; this complex the^aS^ia.es a mtd v" * * 65 Kd SUbUnrt (PR65 >' al *> <^ 

me hotoenzyme m'one of ^T^l^ZTX SE£?,££ ST ""^ ,0 
™™^-where three isoforn^^ 

[1164] Consensus pattern: E-F-D-Y-L-K-S-L-E-l-E-E-K-l N 

Consensus pattern: N-[AGJ-H-fJA]-Y-H-l-N-S-l-S-[LIVM]-N-S-D 

[1165] [ 1] Mayer-Jaekel R., Hemmings B.A. Trends Cell Biol. 4:287-291(1994). 
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A duplicated catalytic motif in a new superfamily of phosphohydrobses and phospholipid synthases that includes pox- 
virus envelope proteins. 
Koonin EV; 

Trends Biochem Sci 1 996;21 :242-243. 

(3]Medline: 94327597 

Cloning and expression of phosphatidylchotine-hydrolyzing phospholipase D from Ricinus communis L 
Wang X, Xu U Zheng L; 
J Biol Chem 1994;269:20312-20317. 
[4]Medline: 97386825 

Regulation of eukaryotic phosphatidylinositoi-specific phospholipase C and phospholipase D. 
Singer WD, Brown HA, Stemweis PC; 
Annu Rev Biochem 1997;66:475-509. 
[1154] 416. (PMI typel) Phosphomannose isomerase type I signatures 

Phosphomannose isomerase (EC 5.3.1.8) (PMI) [1,2] is the enzyme that catalyzes the interconversion of mannose- 
6-phosphate and fructose-6-phosphate. In eukaryotes, it is involved in the synthesis of GDP-mannose which is a con- 
stituent of N- and O-linked glycans as well as GPI anchors. In prokaryotes, it is involved in a variety of pathways 
including capsular polysaccharide biosynthesis and D-mannose metabolism. 

Three classes of PMI have been defined on the basis of sequence similarities [1]. The first class comprises all known 
eukaryotic PMI as well as the enzyme encoded by the manA gene in enterobacteria such as Escherichia coli. Class I 
PMI's are proteins of about 42 to 50 Kd which bind a zinc ion essential for their activity. 

As signature patterns for class I PMI, two conserved regions were selected. The first one is located in the N-terminal 
section of these proteins, the second in the C-terminal half. Both patterns contain a residue involved [3] in the binding 
of the zinc ion. 

[1155] Consensus pattern: Y-x-D-x-N-H-K-P-E [E is a zinc ligand] 

- Consensus pattern: H-A-Y-[UVM]-x-G-x(2)-[LIVM]-E-x-M-A-x-S-D-N-x-{UVM]-R-A-G-x-T-P-K [H is a zinc ligand] 

[ 1] Proudfoot A.E.I., Turcatti G., Wells T.N.C., Payton M.A., Smith D.J. Eur. J. Biochem. 219:415-423(1994). 
[ 2] Coulin F., Magnenat E., Proudfoot A.E.I., Payton M.A., Scully P, Wells T.N.C. Biochemistry 32' 141 39-1 41 44 
(1993). 

[ 3] Cleasby A., Wonacott A., Skarzynski T, Hubbard R.E., Davies G.J., Proudfoot A.E.I., Bernard A.R., Payton 
M.A., Wells T.N.C. Nat. Struct. Biol. 3:470-479(1996). 

[1156] 417. (PNP UDP 1) Purine and other phosphorylases family 1 signature 
The following phosphorylases belongs to the same family: 

- Purine nucleoside phosphorylase (EC 2.4.2.1) (PNP) from most bacteria (gene deoD). This enzyme catalyzes the 
cleavage of guanosine or inosine to respective bases and sugar- 1 -phosphate molecules [1]. 

- Uridine phosphorylase (EC 2.4.2.3) (UdRPase) from bacteria (gene udp) and mammals. Catalyzes the cleavage 
of uridine into uracil and ribose-1 -phosphate. The products of the reaction are used either as carbon and energy 
sources or in the rescue of pyrimidine bases for nucleotide synthesis [2]. 

- S'-methylthioadenosine phosphorylase (EC 2.4.2.28) (MTA phosphorylase) from Sulfotobus solfataricus (3). 

As a signature pattern, a conserved region was selected in the central part of these enzymes. 
[1157] Consensus pattern: [GST]-x-G-[LIVM]-G-x-[PA]-S-x-(GSTA]-l-x(3)-E-L 

- Note: it shoudl be noted that mammalian and some bacterial PNP as well as eukaryotic MTA phosphorylase belong 
to a different family of phosphorylases (see <PDOC00954>). 

[1] Takehara M., Ling R, Izawa S., Inoue Y. Kimura A. Biosci. Biotechnol. Biochem. 59:1987-1990(1995). 

[ 2] Watanabe S.-l., Hino A., Wada K., Eliason J.R, Uchida T J. Biol. Chem. 270:12191-12196(1995). 

[ 3] Cacciapuoti G., Porcelli M., Bertoldo C, De Rosa M., Zappia V J. Biol. Chem. 269:24762-24769(1994). 

[1158] 418. (PP2C) Protein phosphatase 2C signature 

Protein phosphatase 2C (PP2C) is one of the four major classes of mammalian serine/threonine specific protein phos- 
phatases (EC 3.1.3.16). PP2C [1] is a monomeric enzyme of about 42 Kd which shows broad substrate specificity and 
is dependent on divalent cations (mainly manganese and magnesium) for its activity. Its exact physiological role is still 
unclear. Three isozymes are currently known in mammals: PP2C-alpha, -beta and -gamma. In yeast, there are at least 
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f 3) Rhee S.G., Choi K.D. J. Biol. Chem. 267:12393-12396/1992) 

1 4] Stemweis P.C., Smrcka A.V. Trends Biochem. Sci. 17:502-506(1992). 

L 1 k 1491 k ^ iPi - PLC -V Phosphatklylinosftol-specific phospholipase C profiles 

between these two t^bc^So^l^\^ ^? * ^ iSOforTns ' tho 

one SH3 domain are LertiTeWSe l^ZcTJZT™ ™ ,W ° SH2 *"»*«. and 

to be bnportan, for the Z cZZSl^Z ^ conse » ved ^ have been shown 

possibly involved h C^JL^!Z^^SS ' " 3 " ^ «~ < PDOC °°380» 

eukaryotic counterparts. pr-saryow enzymes show no similanty to their 

Two profiles were developed, one covering the X-box. the other the Y-box. 

[1| Meldrum E., Parker P.J., Carozzi A. Biochim. Biophys. Acta 109249-71/19911 f ?i Rho „ ^ , k .,„ 
Second Messenger Phosphoprotein Res. 26:35-610992) 71(1991).| 2] Rhee S.G., Cho, K.D. Adv. 

[ 3] Rhee S.G., Choi K.D. J. Biol. Chem. 267:12393-12396(1992) 
[ 4] Stemweis P.C.. Smrcka A.V. Trends Biochem. Sci. 17:502-506(1992). 

[1150J 414. (PK) Pyruvate kinase active site signature 

(gene pykF) and PK-II (gene pykA) All PK JvZ, Tl , T\ Eschenchla 0011 ,ne '<» ™ two isozymes: PK-I 
acid rescues. W SOZymeS S98m to 1,6 ,etramers °' ""teal subunrts of about 500 amino 

zcmTZIC™ZZ^ ~ ? ,eC,ed ,hal inC ' UdeS 3 ^ *-* seems to be the 

pleated in the TJ"Zu7Z ™ * ^ ^ en ° ,PyrUVa,e - and 3 ^ *°« - 

PC is the 

J1152] [ 1] Mu.rr.ead H. Biochem. Soc. Trans. 18:193-196(1990) 
[1153J 415. (PLDc) Phospholipase D. Active site motif 

SSK2?£X^ iso, n? are ■ c *- d by ^ <arfs> 

PC-hyd^ng PLO . a homologue of cardan synthase. ^ a ^ e JZ^L^ P LDs , ^ viral 

we^^^^^^ » .he presence of lw motifs confining 

E. co,i endonuclease if^^SS^^ISlT' J"" 1 *? l ° *" ^ ^ * 
The profile contained here represents <£^pSET££ 1 reZf J* **? "* " *- ^ 
repeat units has not been achieved. 9 ' a " aCCUra,e mul,i P ,e aBgnment of the 

Number of members- 139 
MJ 

Medline: 96303814 

Ponting CR Kerr fD; 

Protein Sci 1996;5:914-922. 
[2]Medline: 96334293 
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Insulin Receptor Substrate 1 (IRS-1). 

- Regulators of small G-proteins like guanine nucleotide releasing factor GNRP (Ras-GRF) (which contains 2 PH 
domains), guanine nucleotide exchange proteins like vav, dbl, SoS and yeast CDC24, GTPase activating proteins 
like rasGAP and BEM2/IPL2, and the human break point cluster protein bcr. 

s - Cytoskeletal proteins such as dynamin (see <PDOC00362>), Caenorhabditis elegans kinesin-tike protein unc-1 04 
(see <PDOC00343>), spectrin beta-chain, syntrophin (2 PH domains) and yeast nuclear migration protein NUM1. 

- Mammalian phosphatkJylinositol-specific phospholipase C (PI-PLC) (see <PDOC50007>) isoforms gamma and 
delta. Isoform gamma contains two PH domains, the second one is split into two parts separated by about 400 
residues. - Oxysterol binding proteins OS BP, yeast OSH1 and YHR073w. 

10 - Mouse protein citron, a putative rho/rac effector that binds to the GTP-bound forms of rho and rac, 

- Several yeast proteins involved in cell cycle regulation and bud formation like BEM2, BEM3, BUD4 and the 
BEM1-binding proteins BOI2 (BEB1) and BOI1 (BOB1). - Caenorhabditis elegans protein MIG-10. 

- Caenorhabditis elegans hypothetical proteins C04DR 1 , K06H7.4 and ZK632. 1 2. 
Yeast hypothetical proteins YBR1 29c and YHR1 55w. 

15 

The profile for the PH domain, which has been developed by Toby Gibson at the EMBL, covers the total length of 
domain. Several proteins contain large insertions in the PH domain and are thus difficult to detect with this profile. In 
some of these cases, the profile will align only to one half of the PH domain. 

20 - Sequences known to belong to this class detected by the pattern: ALL. But it should be noted that while ail se- 
quences containing PH domains are detected, not all PH domains are. Some of the split domains lie below the 
cutoff threshold. 

[ 1] Mayer B.J., Ren R, Clark K.L. ( Baltimore D. Cell 73:629-630(1993). 

25 [ 2] Haslam R.J., Koide H.B., Hemmings B.A. Nature 363:309-310(1993). 

[ 3] Musacchb A., Gibson T.J., Rice P., Thompson J., Saraste M. Trends Biochem. Sci. 18:343-348(1993). 
( 4] Gibson T.J., Hyvonen M., Musacchio A, Saraste M., Bimey E. Trends Biochem. Sci. 19:349-353(1 994). [ 5] 
Pawson T. Nature 373:573-580(1 995). [ 6] Ingley E., Hemmings B.A. J. Cell. Biochem. 56:436-443(1 994). [ 7] Sar- 
aste M., Hyvonen M. Curr. Opin. Struct. Biol. 5:403-408(1 995).[ 8) Riddihough G. NaL Struct. Biol V755-757 

30 (1994). 

411. PHD-finger 
[1] 

Medline: 95216093 

35 The PHD finger implications for chromat in-mediated transcriptional regulation. 
Aasland R, Gibson TJ, Stewart AF; 

Trends Biochem Sci 1995;20:56-59. 
Number of members: 181 

[1148] 412. (PI-PLC-X) Phosphatidylinositol-specific phospholipase C profiles Phosphatidylinositol-specific phos- 
40 pholipase C (EC 3.1 .4.11), an eukaryotic intracellular enzyme, plays an important role in signal transduction processes 
[1]. It catalyzes the hydrolysis of 1 -phosphatidyl-D-myo- inositol- 3,4, 5-triphosphate into the second messenger mole- 
cules diacylglycerol and inositol-1 ,4, 5-triphosphate. This catalytic process is tightly regulated by reversible phosphor- 
ylation and binding of regulatory proteins [2 to 4]. 

In mammals, there are at least 6 different isoforms of PI-PLC, they differ in their domain structure, their regulation, and 

45 their tissue distribution. Lower eukaryotes also possess multiple isoforms of PI-PLC. 

All eukaryotic PI-PLCs contain two regions of homology, sometimes referred to as 'X-box' and Y-box*. The order of 
these two regions is always the same (NH2-X-Y-COOH), but the spacing is variable. In most isoforms, the distance 
between these two regions is only 50-100 residues but in the gamma isoforms one PH domain, two SH2 domains, and 
one SH3 domain are inserted between the two PLC-specific domains. The two conserved regions have been shown 

so to be important for the catalytic activity. At the C-terrninal of the Y-box, there is aC2 domain (see <PDOC00380>) 
possibly involved in Ca-dependent membrane attachment. 

Profile analysis shows that sequences with significant similarity to the X-box domain occur also in prokaryotic and 
trypanosome Pl-specific phospholipases C. Apart from this region, the prokaryotic enzymes show no similarity to their 
eukaryotic counterparts. 
55 Two profiles were developed, one covering the X-box, the other the Y-box. 

[ 1] Meldrum E., Parker P.J., Carozzi A. Biochim. Biophys. Acta 1092:49-71 (1991).[ 2) Rhee S.G., Choi K.D. Adv. 
Second Messenger Phosphoprotein Res. 26:35-61(1992). 
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[1144] [ 1] Walson H.C.. Littlechild J.A. Biochem. Soc. Trans. 18- 187- 190(19901 
[1145] ~<PGMPMM,Pho*K*^^ 

- Phosphomannomutase (EC 5 4 2 81 SET pKTL « breakdown and synthesis of glucose [1]. 

- Urease operon protein ureC from Helicobacter pylori. 
Escherichia coli protein mrsA. 

" TmJZT te,raUre,ia Para,USin, 8 P^^^coprotein involved in exocytosis 

A Methanols vann.eh, hypothetic*, protein in .he 3'region of ,he genX ribosomal protein S10. 

SIT" C ° nSenSUS Pat,6rn: [GSAHLI ^ UVM HSWGA h S-H-x- P -x(4 H GNH^ ,S Is the phosphoserine res, 
• Note: PMM from fungi do not belong to this family. 

I » Jit 8 "' UU n Y ; ^ Jf ' K0 ° nO M - J ' Bi °'- Chera 267:6322-6337(1992) 
3 7 TTH f -nt 0 S J - R ° mana LK ' Reeves RR - Mo1 - G ^ Genet. 227 1 73-1 80(1 991) 
2 ? , o ^' Chakrabart V * M - B^ry A. J. Biol. Chem. 266:9754-9763(1991) >- 

[ 6] Subramanfcn S.V.. Wyroba E., Andersen A.P., Satir B.H. Proc. Natl. Acad. ScL U.S.A. 91:9832-9836,1994). 
[1147] 410. PH domain profile 

- binding to lipids, e.g. phosphatidylinositol-4,5-bisphosphate 

- binding to phosphorylated Ser/Thr residues, 

- attachment to membranes by an unknown mechanism. 

Th'e 3? s S ,n!n, ,hat ? Brent , PH d0mainS h3Ve ' 0,a,ly ditferent requirements 

beta-strands dfler great^n 4^ ^ **• connecting the 9 

residues within the PH domain. reeirveiy amicult to detect. There are no totally invariant 

Proteins reported to contain one more PH domains belong to the following families: 
- Tyrosine protein kinases belonging to the Btk/ltk/Tec subfamily. 
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of phosphoglycerate [1,2]. Both enzymes can catalyze three different reactions, although in different proportions: 

- The isomerizatbn of 2-phosphogtycerate (2-PGA) to 3-phosphoglycerate (3-PGA) with 2.3-diphosphoglycerate 
(2,3-DPG) as the primer of the reaction. 

- The synthesis of 2,3-DPG from 1 ,3-DPG with 3-PGA as a primer 

- The degradation of 2,3-DPG to 3-PGA (phosphatase EC 3.1 .31 3 activity). 

In mammals, PGAM is a dimeric protein. There are two isoforms of PGAM: the M (muscle) and B (brain) forms. In 
yeast, PGAM is a tetrameric protein. BPGM is a dimeric protein and is found mainly in erythrocytes where it plays a 
major role in regulating hemoglobin oxygen affinity as a consequence of controlling 2,3-DPG concentration. 
The catalytic mechanism of both PGAM and BPGM involves the formation of a phosphohistidine intermediate [3J. 
The bHunctional enzyme 6-phosphofructo-2-kinase / fructose-2,6-bisphosphatase (EC 2.7.1.105 and EC 31.346) 
(PF2K) [4] catalyzes both the synthesis and the degradation of fructose-2,6-bisphosphate. PF2K is an important en- 
zyme in the regulation of hepatic carbohydrate metabolism. Uke PGAM/BPGM, the fructose-2.6-bisphosphatase re- 
action involves a phosphohistidine intermediate and the phosphatase domain of PF2K is structurally related to PGAM/ 
BPGM. 

The bacterial enzyme alpha-ribazole-S'-phosphate phosphatase (gene cobC) which is involved in cobalamin biosyn- 
thesis also be tongs to this family [5J. 

A signature pattern was built around the phosphohistidine residue. 

[1 1 40] Consensus pattern: [LI VM]-x-R-H-G-[EQ]-x(3)-N [H is the phosphohistidine residue] 

- Note: some organisms harbor a form of PGAM independent of 2,3-DPG, this enzyme is not related to the family 
described above [6]. 

[ 1] Le Boulch P., Joulin V., Garel M.-C, Rosa J., Cohen-Solal M. Biochem. Biophys. Res Commun 156 874-881 
(1988). 

[ 2) White M.F., Fothergill-Gilmore LA. FEBS Lett. 229:383-387(1988). 
[ 3] Rose 2.B. Meth. Enzymol. 87:43-51(1982). 

[ 4] Bazan J.F. Fletterick R.J., Pilkis S.J. Proc. Natl. Acad. Sci. U.S.A. 86:9642-9646(1989). 

[ 5] OToole G.A., Trzebiatowski J.R., Escalante-Semerena J.C. J. Biol. Chem. 269:26503-26511(1994). 

[ 6] Grana X., De Lecea L, El-Maghrabi M R., Urena J.M., Caellas C, Carreras J., Puigdomenech P., Pilkis S J 

Climent F. J. Biol. Chem. 267:12797-12803(1992). 

[1141] 407. (PGI) Phosphoglucose isomerase signatures 

Phosphoglucose isomerase (EC 5.3. 1.9) (PGI) [1 ,2) is a dimeric enzyme that catalyzes the reversible isomerization of 

glucose-6-phosphate and fructose-6-phosphate. PGI is involved in different pathways: in most higher organisms it is 

involved in glycolysis; in mammals it is involved in gluconeogenesis; in plants in carbohydrate biosynthesis: in some 

bacteria it provides a gateway for fructose into the Entner-Doudouroff pathway. PGI has been shown [3] to be identical 

to neuroleukin, a neurotrophic factor which supports the survival of various types of neurons. 

The sequence of PGI from many species ranging from bacteria to mammals is available and has been shown to be 

highly conserved. As signature patterns for this enzyme two conserved regions were selected, the first region is located 

in the central section of PGI, while the second one is located in its C-terminal section. 

[1142] Consensus pattern: [DENS]-x-[LIVM]-G-G-R-[FY]-S-[LIVMT]-x-[STA]-[PSACHLIVMA]-G 

- Consensus pattern: [GS]-x-[LIVM]-[LIVMFYW]-x(4)-[FY]-[DN]0-x-G-V-E-x(2)-K 

[ 1] Achari A., Marshall S.E., Muirhewad H., Palmieri R.H., Nottmann E.A. Philos. Trans. R Soc Lond B Biol 
Sci. 293:145-157(1981). 

[ 2] Smith M.W., Doolittle R.F. J. Mol. Evol. 34:544-545(1992). 

[ 3] Faik P, Walker J.I.H., Redmill A.A.M., Morgan M.J. Nature 332:455-456(1988). 

[1143] 408. (PGK) Phosphoglycerate kinase signature 

Phosphoglycerate kinase (EC 2.7.2.3) (PGK) [1] catalyzes the second step in the second phase of glycolysis, the 
reversible conversion of 1,3-diphosphogtycerate to 3-phosphoglycerate with generation of one molecule of ATP. PGK 
is found in all living organisms and its sequence has been highly conserved throughout evolution. It is a two-domain 
protein; each domain is composed of six repeats of an alpha/beta structural motif. As a signature pattern for PGK's, a 
conserved region in the N -terminal region was selected. 
Consensus pattern: [KRHGTCVNJ-[VT]-[LIVMF]-[LIVMC]-R-x-D-x-N-[SACV]-P 



EP 1 033 405 A2 



<PDOC0001 7» and is also part of the ATP-LTC^in f2L TP " b " din9 *' < see 

[1130] Consensus pattern: U-G-D-D-E-H-x-W-x-[DE]-x-G-pV]-x-N 

S^ISSSSSEZS =~ (ho 

oxaloacetate and phosphate. The enzyme is louStaSSJS pho ?* oen( *» n ««to by bicarbonate to yield 
a rysine [2, have been'imp.fcated ^,1^^^ 

residues are highly conserved in PEPcase from varin,,* n^T. 1 . ^ ' re9K3ns around ,nese ac «ve site 

signature patterns for this type <Je*JmT ' ' ^ 3011 c » anota ^ can be used as a 

[1132] Consensus pattern: tVT>x-T-A-H-P-T-fEOI-xf21-R-fKRMt rM i- = n =^-- 

Consensus pattern: [IV]-M-[LIVMH3-Y-S-D-S , K wSSSn?-^ i3a " aCtiVe s,te res,due ]- 

ri133l r 11 tL,h= I . • " ™ J b "-S-x-K-D-lSTAGJ-G [K is an actrve site residuel- 

[H33J [ 1] Terada K., Izui K. Eur. J. Bbchem 202797-B03nq<mr:>i i a „ ™ bj 

M.H.. Andreo C.S. Bicchim. Biophys. Acta ^ RE " Ch0 " e, ° PLea,y 

[1134] 404. PET112 family signature 

The lowing proteins from euKaryotes. prokaryotes and archaebacteria be .ong ,o the same famiry: 

" SJ^SSS 2 n WHiCh ^ ^ Unkn0W1 "* * - « °< *«ochondria, genes, 

- Aspergillus nidulans mitochondrial protein nempA 

- Bacillus subtilis hypothetical protein yzdD 

- Moraxella catarrhalis hypothetical protein in bloR-1 Region 

- Mycoplasma genitalium hypothetical protein MG100 

- Methanococcus jannaschii hypothetical proteins MJ001 9 and MJ0160. 

ZS^ZZ^Z^ ,0 ami "° aC ' dS - AS 9 « conserved region fccated ,n 

?° nsensus P 3 *™ f D N]-x-[DN]-R-x(3)-P- L -[LIV]-E-fLIVl-x-rs-n-x P 
S 3 [ 1] Mulero J.J.. Rosens. J.K.. Fox T.D. Cu r. Gen 25 ZSSSZ** 
[1137] 405. (PFK) Phosphofructokinase signature ( > 

Phosphofructokinase (EC 2 7 1 m /pfio n 01 « i. 

P^o^by ATPo,^^^ h ,«- pathway. I, catah/zes the 

^*^ni,, l nmamma te i,sa^ 

wh«h are highly related to the bacterial 36 Kd subunrtsTn Hu2 ? Kdsub " nrtcons,s «o'two homologous domains 
zymes: PFKM (muscle). PFKL (Irver). and PFKP SSi U^S? "* °' of PFK iso- 

chains (gene PFK1 ) and four 1 00 Kd beta c^^^^El" "T™! * *« 100 Kd a.pha 

subunits are composed of two homologous d££ST mammalian 80 Kd subunits. the yeast 100 Kd 

IVetr Pa " em '° r 3 *- ~**» *~ ^vofved h fructoses-phosphate bind„ g 

Kp^^ 

pShogTceS 
-edenzymes^hcaUze reac,^ 
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A., Burgess PM.J. Nucleic Acids Res. 18:261-265(1990). 

[ 4] O'Reilly D.R., Crawford A.M., Miller LK. Nature 337:606-606(1989). 



[1121] 399. {PUT) Prephenate dehydratase signatures 

Prephenate dehydratase (EC 4.2.1.51) (PDT) catalyzes the decarboxylation of prephenate into phenylpyruvate. In 
microorganisms PDT is involved in the terminal pathway of the biosynthesis of phenylalanine. In some bacteria such 
as Escherichia coli PDT is part of a bif unctional enzyme (P-protein) that also catalyzes the transformation of chorismate 
into prephenate (chorismate mutase) while in other bacteria it is a monofunctional enzyme. The sequence of mono- 
functional PDT align well with the C-terminal part of that of P-proteins [1]. 

As signature patterns for PDT two conserved regions were selected. The first region contains a conserved threonine 
which has been said to be essential for the activity of the enzyme in E. coli. The second region includes a conserved 
glutamate. Both regions are in the C-terminal part of PDT. 

[1122] Consensus pattern: IFY^x-(LIVMhx(2)-[LIVM]-x(5)-[DN]-x(5)-T-R-F-[LIVMWl-x-[LIVM] 
[1123] [ 1] Fischer R.S., Zhao G., Jensen R.A. J. Gen. Microbiol. 137:1293-1301(1991). 
[1 1 24] 400. PD2 domain (Also known as DHR or GLGF). 
[1 1 25] PDZ domains are found in diverse signaling proteins. 
[1126] (1] Ponting CP, Phillips C, Davies KE, Blake DJ 

Bbessays 1997; 19:469^179. [2] Doyle DA, Lee A. Lewis J, Kim E, Sheng M, MacKinnon R; Cell. 1996;85:1067-1076 

[3] Ponting CP; Protein Sci 1 997;6: 464-468. 

[1127] 401 . (PPDK_N_term) PEP-utilizing enzymes signatures 

A number of enzymes that catalyze the transfer of a phosphoryl group from phosphoenolpyruvate (PEP) via a phospho- 
histidine intermediate have been shown to be structurally related [1,2,3,4]. These enzymes are: 

- Pyruvate.orthophosphate dikinase (EC 2.7.9.1) (PPDK). PPDK catalyzes the reversible phosphorylation of pyru- 
vate and phosphate by ATP to PEP and diphosphate. In plants PPDK function in the direction of the formation of 
PEP, which is the primary acceptor of carbon dioxide in C4 and crassulacean acid metabolism plants. In some 
bacteria, such as Bacteroides symbiosus, PPDK functions in the direction of ATP synthesis. 

- Phosphoenolpyruvate synthase (EC 2.7.9.2) (pyruvate.water dikinase). This enzyme catalyzes the reversible 
phosphorylation of pyruvate by ATP to form PEP, AMP and phosphate, an essential step in gluconeogenesis when 
pyruvate and lactate are used as a carbon source. 

- Phosphoenolpyruvate-protein phosphotransferase (EC 2.7.3.9). This is the first enzyme of the phosphoenolpyru- 
vate-dependent sugar phosphotransferase system (PTS), a major carbohydrate transport system in bacteria. The 
PTS catalyzes the phosphorylation of incoming sugar substrates concomitant with their translocation across the 
cell membrane. The general mechanism of the PTS is the following: a phosphoryl group from PEP is transferred 
to enzyme-l (El) of PTS which in turn transfers it to a phosphoryl carrier protein (HPr). Phospho-HPr then transfers 
the phosphoryl group to a sugar-specific permease. 



All these enzymes share the same catalytic mechanism: they bind PEP and transfer the phosphoryl group from it to a 
histidine residue. The sequence around that residue is highly conserved and can be used as a signature pattern for 
these enzymes. As a second signature pattern a conserved region was selected in the C-terminal part of the PEP- 
utilizing enzymes. The biological significance of this region is not yet known. 

[1128] Consensus pattern: G-[GAJ-x-[TN]-x-H-[STAHSTAV]-[LIVM](2)-[STAV]-[RG] [H is phosphorylated] 

- Consensus pattern: [DEQSK]-x-[LIVMH-S-[LIVMF]-G-[ST]-N^ 

[GAS)-x(2)-R 1 1 

I 1) Reizer J., Hoischen C, Reizer A., Pham T.N., Saier M.H. Jr. Protein Sci. 2:506-521(1993). 

( 2] Reizer J., Reizer A., Merrick M.J., Plunkett G. Ill, Rose D.J., Saier M.H. Jr. Gene 181:103-108(1996). 

[ 3] Pocalyko D.J., Carroll L.J., Martin B.M., Babbitt PC, Dunaway-Mariano D. Biochemistry 29 10757- 10765 

(1990). 

[ 4] Niersbach M., Kreuzaler F, Geerse R.H., Postma P., Hirsch H.J. Mol. Gen. Genet. 232:332-336(1992). 
[1129] 402. (PEPCK ATP) Phosphoenolpyruvate carboxylase (ATP) signature 

Phosphoenolpyruvate ca rboxy kinase (ATP) (EC 4. 1 . 1 . 49) (PE PCK) [ 1 ] catalyzes the formation of phosphoenolpyruvate 
by decarboxylation of oxaloacetate while hydrolyzing ATP, a rate limiting step in gluconeogenesis (the biosynthesis of 
glucose). 

The sequence of this enzyme has been obtained from Escherichia coli. yeast, and Trypanosoma brucei; these three 
sequences are evolutionary related and share many regions of similarity. As a signature pattern a highly conserved 
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and maturation. This protein belongs to a family that also includes: 

- Drosophila antennal protein A5, a putative odorant-binding protein 

- Onchocerca volvulus antigen Ov-16 and the related proteins D1. D2 and D3 
• Plasmodium falciparum putative phosphatidylethanolamine^inding protein ' 

Sir™ 

protein YLR179C. ^^^^^ ^^^^^P^'nisnotveryclear.-Yeasthypothetical 

- Caenorhabditis elegans hypothetical protein F40A3.3. 

^ COnSefVed SS,eC,ed ^ fe h *•«** •» third of the 

[1114] Consensus pattern: (FYLhx-{LyHUVF].x-rnVHDChP-D-x-P-[SN]-x(10)-H 

*^m**" ^ m ' Penn BUCqUOV S - JOBeS P - *»»«*•" F. jl Mol. Eve, 

[ 2) Schoentgen F., Jolles R FEBS Lett. 369:22-6(1995). 

[1115] 396. PCI domain 

This domain has also been called the PINT motif (Proteasome 
lnt-6, Nip-J and TRIP-15) [1 J. 
Number of members* 49 
[1] 

Medline: 98308842 

M^TZteS™™ h ,hr8e COmP ' eXeS - 

Trends Biochem Sci 1 998;23:204-205. 
[2]Medline: 98266368 

H n^s p cr somesubunftsarere ^^ 

Protein Sci 1998;7:1250-1254 
yltransferase)isan enzyme that cataSSS^^^ 

groups of D-aspartyl or L-jscasrTrtvE 1« Tf , ™W9™P «rom S-adenosylmethionine to the free carboxyl 
■-sparty.resid^L^^^ 

of normal L-asparty. and L^oi^^Z T * Nation and/or tsomerization 

damaged proteins; me enzyrLicmeCS 
L-aspartylresklues. PcwL a we^L*^ 

wlil^ MCFadde " H J - O-CoU Crake's. C«np. Biochem. Physio,. 117hc 

EU? , 398 ' ( « CNA) Pro,i,era,in 9 ce » antigen signatures 

wlh the vital encoded DNA p<tymeiase An rZ^JlSS t .OUT" " 

• Consent |RKAJ<:-|De]-{RHl-x(3)^UVMF]-x(3)-(LIVM|.x-{SGAN|-jUVMF]-x-K-|LtVMF)(2) 

! '! ?™"- F ~* a ' B »™»» R*. r*D«™U-B» TO H. New™ 326 515.5171HS71 

1 4 ^ u Heu, S.. „«. „, Ke^, s.. H^o a Eul . , iHa^, 
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[ 5] Ulrich S.J., Anderson C.W., Mercer W.E., Appella E. J. Biol. Chem. 267:15259-15262(1992). 
[1 1 07] 391 . (P5CR) Delta 1 -pyrroOne-5-carboxylate reductase signature 

Delta 1-pyrroline-5-carboxylate reductase (P5CR) (EC 1.5.1.2).[1.2] is the enzyme that catalyzes the terminal step in 
the biosynthesis of proline from glutamate, the NAD(P) dependent oxidation of 1-pyrroIine-5-carboxylate into proline. 
The sequences of P5CR from eubacteria (gene proC), archaebacteria and eukaryotes show only a moderate level of 
overall similarity. As a signature pattern, the best conserved region located in the C-terminal section of P5CR was 
selected. 

[1 1 08] Consensus pattern: (PAU^x(2»3)-[LIV]-x(3HUVM]-[STAC 

[LMFHDENQK] ' 1 J 1 J * ' 

[ 1] Deiauney A.J., Verma D.P. MoL Gen. Genet. 221:299-305(1990). 

[ 2] Savioz A., Jeenes D.J., Kocher H.P., Haas D. Gene 86:107-111 (1990). 

[1109] 392. Poly-adenylate binding protein, unique domain. 

[1110] 393. (PAL) Phenylalanine and histidine ammonia-lyases active site 

Phenylalanine ammonia-lyase (EC 4.3.1.5) (PAL) is a key enzyme of plant and fungi phenylpropanoid metabolism 
which is involved in the biosynthesis of a wide variety of secondary metabolites such as flavanoids, furanocoumarin 
phytoalexins and cell wall components. These compounds have many important roles in plants during normal growth 
and in responses to environmental stress. PAL catalyzes the removal of an ammonia group from phenylalanine to form 
trans -cinnamate. 

Histidine ammonia-lyase (EC 4.3.1.3) (histidase) catalyzes the first step in histidine degradation, the removal of an 
ammonia group from histidine to produce urocanic acid. 

The two types of enzymes are functionally and structurally related [1], They are the only enzymes which are known to 
have the modified amino acid dehydroalanine (DHA) in their active site. A serine residue has been shown [2,3,4] to be 
the precursor of this essential electrophilic moiety. The region around this active site residue is well conserved and 
can be used as a signature pattern. 
[1111] Consensus pattern: G4STC^^ 

[ 1] Taylor R.G., Lambert M.A., Sexsmith E., Sadler S.J., Ray P.N., Mahuran D.J., Mclnnes R R J Biol Chem 
265:18192-18199(1990). 

[ 2] Langer M., Reck G., Reed J., Retey J. Biochemistry 33:6462-6467(1994). 

[ 3] Schuster B., Retey J. FEBS Lett. 349:252-254(1994). 

[ 4] Taylor R.G., Mclnnes R.R. J. Biol. Chem. 269:27473-27477(1994). 

[1112] 394. PAS domain 

-!- CAUTION. This family does not currently match all known examples of PAS domains. 
PAS motifs appear in archaea, eubacteria and eukarya. Probably 
the most surprising identification of a PAS domain was that in 
EAG-like K+-channels(1,3J. 
Number of members: 308 
[1] 

Medline: 97446881 

PAS domain S-boxes in archaea, bacteria and sensors for oxygen and redox. 
Zhulin IB. Taylor BL, Dixon R; 
Trends Biochem Sci 1997;22:331-333. 
[2]Medline: 95275818 

1.4 A structure of photoactive yellow protein, a cytosolic photoreceptor, unusual fold, active site, and chromophore 
Borgstahl GE, Williams DR, Getzofl ED; 
Biochemistry 1995;34:6278-6287. 
[3]Medline: 98044337 
PAS. a multifunctional domain family comes to light. 
Ponting CP, Aravind L; 
Curr Biol 1997;7:674-677. 
[1113] 395. (PBP) Phosphatidylethanolamine-binding protein family signature 

Mammalian phosphatidylethanolamine-binding protein (also knowns as basic cytosolic 21 Kd protein) is a 186 residue 
protein found in a variety of tissues [ 1 ]. It binds hydrophobic ligands, such as phosphatidylethanolamine, but also seems 
[2] to bind nucleotides such as GTP and FMN, it is suggested that it could act in membrane remodeling during growth 
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Medline: 93110040 



The NADH:ubiquinone oxidoreductase (complex I) of respiratory chains. Walker J E - 
Q Rev Biophys-1 992:25:253-324 *' 

02 m wSklf p qf) NADH-ub.qu.none oxidoreductase chain 4. amino terminus 
11 102] [1] Walker JE ; Q Rev Bophys 1992-25-253-324 

m£Z comp ex k*aS ?h t?££ mTn^drSmh "V?!**"" * « oligomer* 

cyanobacteria (as a NADM-plas^T^^^ TV T and h 

getic enzyme comp.ex ther? is onl with a SS£ weSt^S ^ m l SUbUni,S * bfoener - 

- N"^"^^ _ . . . 
encoded in Paramecium (gene psbG). - , ,!euros Poracra a sa.-i^iiochonanai 

- Chloroplast encoded in various higher plants (gene ndhK or psbG). 

The 20 Kd subunit is highly similar to [4]: 

- Synechocystis strain PCC 6803 proteins psbG1 and psbG2 

" tS!H ^ o, h p riChia C0 " oxidoreductase (gene nuoB). 

- SbunI 7 ?? °* Pa,acoccus denitrificans NADH-ubiquinone oxidoreductase. 
Subunrt 7 of Eschenchia coli formate hydrogenlyase (gene hycG) 

- Subunrt I of Escherichia coli hydrogenase-4 (gene hyfl). 

1 1] Ragan C.I. Curr. Top. Bioenerg. 15:1-36(1987) 

[ 2] Weiss H Friedrich T. Hofhaus G., Preis D. Eur. J. Biochem. 197:563-576(1991) 

! wT" V n - M ' • Skeh91 JM " Wa,ter J E - FEBS Lett- 301 237 242(1 992) 

[ 4J We,dner U.. Ge.er S.. Flock A.. Frieda T, Lerf H., Weiss H. J. Mol. B»SS 9 93). 

[1105] 390. p53 tumor antigen signature 

cel., ,t is frequent* mutator Si t^r^ZT T ^ * "° W leVete h ««* 

but probably not all, tumor types. P 53 is p^Z^^SS^^ T, "* 88 3 tUm ° r su PP' ess °' *» some. 

to about 300). and a highly ba2 cTmS, ™£ Th ^ a ' n (POS ' ,,0n 80 ,0 150 >" a ce " ,ral (from 150 
attempts to L» p53?n l^SiS^^J^^ ~ h "«"~ — = 

^XTr T Ce^^ * - -,a. rea*n of the protein 

large T antigen of SV40. In rrZhis reZ is The Sus ofTl J °? 9 " adjaCem in thB bindin 9 <* <"* 

[1106] Consensus pattern: I^SSS^SSSS^ P0,m ' TU " a,i0nS h C8nc,,0 '» ,Umors " 

[ 1] Levine AJ.. Momand J., Finlay C.A. Nature 351:453-456(1991) 
2 Lev.ne AJ.. Momand J. Biochim. Biophys. Acta 1032.119-136(1990) 

! , ™ Caf0n 06 Fromen,el C ■ ^y P. Oncogene 5:945-952(1990) 
I 4J Lane D R. Benchimol S. Genes Dev. 4:1-8(1990). 
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[ 3] Denhardt D.T., Guo X. FASEB J. 7:1475-1482(1993). 
[1090] 382. Oxysterol-binding protein family signature 

A number of eukaryotic proteins that seem to be involved with sterol synthesis and/or its regulation have been found 
[1] to be evolutionary related: 

Mammalian oxysterol-binding protein (OSBP). A protein of about 800 amino-acid residues that binds a variety of 
oxysterols: oxygenated derivatives of cholesterol. OSBP seems to play a complex role in the regulation of sterol 
metabolism. 

- Yeast proteins HES1 and KES1; highly related proteins of 434 residues that seem to play a role in ergosterol 
synthesis. 

- Yeast OSH1, a protein of 859 residues that also plays a role in ergosterol synthesis. - Yeast hypothetical protein 
YHR001W (437 residues). 

Yeast hypothetical protein YHR073w (996 residues). 
Yeast hypothetical protein YKR003w (448 residues). 

[1091) All these proteins contain a moderately conserved domain of about 250 residues located in the C-terminal 
half of OBSR OSH1 and YHR073w and in the central section of the other proteins. As a signature pattern, the best 
conserved part was selected of this domain, a region that contains a conserved pentapeptide. 
[1 092] Consensus pattern: E-[KQ]-x-S-H-[HR]-P-P-x-[STACF]-A 

[1093] [ 1] Jiang B., Brown J.L, Sheraton J. t Fortin N., Bussey H. Yeast 10:341-353(1994). 
[1094] 383. FMN oxidoreductase 
[1 095] 384. Oxidoreductase FAD/NAD-binding domain 
Number of members: 250 
[1] 

Medline: 92084635 

The sequence of squash NADH:nitrate reductase and its relationship to the sequences of other flavoprotein oxidore- 
ductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. 
Hyde GE, Crawford NM, Campbell W; 

J Biol Chem 1991;266:23542-23547. 

[2JMedline: 95111952 

Crystal structure of the FAD-containing fragment of com nitrate reductase at 2.5 A resolution: relationship to other 
flavoprotein reductases. 
Lu G ( Campbell WH, Schneider G, Lindqvist Y; 
Structure 1994;2:809-821. 

[1096] 385. (oxidored molyb) Eukaryotic molybdopterin oxidoreductases signature A number of different eukaryotic 
oxidoreductases that require and bind a molybdopterin cofactor have been shown ( 1 ] to share a few regions of sequence 
similarity. These enzymes are: 

- Xanthine dehydrogenase (EC 1 .1 1 .204), which catalyzes the oxidation of xanthine to uric acid with the concomitant 
reduction of NAD. Structurally, this enzyme of about 1300 amino acids consists of at least three distinct domains: 
an N-terminal 2Fe-2S ferredoxin-like iron-sulfur binding domain (see <PDOC00175>), a central FAD/NAD-binding 
domain and a C-terminal Mo-pterin domain. 

- Aldehyde oxidase (EC 1.2.3.1), which catalyzes the oxidation aldehydes into acids. Aldehyde oxidase is highly 
similar to xanthine dehydrogenase in its sequence and domain structure. 

- Nitrate reductase (EC 1 .6.6. 1 ), which catalyzes the reduction of nitrate to nitrite. Structurally, this enzyme of about 
900 amino acids consists of an N-terminal Mo-pterin domain, a central cytochrome b5-type heme-binding domain 
(see <PDOC00170>) and a C-terminal FAD/NAD-binding cytochrome reductase domain. 

- Sulfite oxidase (EC 1.8.3.1), which catalyzes the oxidation of sulfite to sulfate. Structurally, this enzyme of about 
460 amino acids consists of an N-terminal cytochrome b5-binding domain followed by a Mo-pterin domain. 

There are a few conserved regions in the sequence of the molybdopterin-binding domain of these enzymes. The pattern 
used to detect these proteins is based on one of them. It contains a cysteine residue which could be involved in binding 
the molybdopterin cofactor. 

[1097] Consensus pattern: [GAJ-x(3)-[KRNQHT|-x(11,14)-[LIVMFYWS]-x(8)-[LIVMF]-x-C-x(2HDEN]-R-x(2)-|DE] 
[1098] [1] Wootton J.C.. Nicolson R.E., Cock J.M., Walters D.E., Burke J.F., Doyle W.A., Bray R C Biochim Biophys 
Acta 1057:157-185(1991). 

[1099] 386. (Oxidored ql) NADH-Ubiquinone/ptastoquinone (complex I), various chains 
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coalescence of the oil They n^ ^^^^^^^^ dBai ^^^ by preventing 
seedling 9^. Oieosins arLou^ «*«0- h *** during 

the lipid and phospholipid moieties * P ^ ' n,erfaCe * °" bod,es and P robablv interact with both 

^^Tso^i^^JS^ fT" " ^ ^ »*" - 

athic regfc* of variable .engih <»™ 60 to 1^^ " d 3 C - terminal am P hi P" 

of beta-strand structure and to interact w^the S 2n! 1 1 * ?"* * Pr0p0Sed ,0 be ™* U P 

[1082] consensus pattern: WH»IW?WflW^^ 

[ 1) Murphy D.J., Keen J.N., CSullivan J.N., Au DMY Edwards F w ,^ 

Shaw C.H., Ryan A.J. Bochim. Bfcphys. Ac* ioSbmSS? "* L " Gtob °" S T " 

[ 2] Tzen J.T.C., Lie G.C., Huang A.H.C. J. Biol. Chem. 267:15626-15634(1992). 

[1083] 379. (Orbi VP5) Orbfvirus outer capsid protein VPS 

ffi ^T aP Z ^ ,0CatiOn « me diflerent P"*™* and their rela^en to eac^ 

- Eukaryotic ornithine decarboxylase (EC 4 1 1 17) (ODC\ nnr «. . 

trescine. 1 i-i/MODC). ODC catalyzes the transformation of ornithine intopu- 

- Prokaryotic diaminopimelic acid decarboxylase (EC 4 1 1 20) (DAPDC) DAPnr k .u 
am.nop.me.b acid into lysine; the last step in the bbsyntheS if lysine ^ ^ C ° nVerSi ° n ° f di " 

" ^S^SSST' Pr ° ,ebl ^ ^ iS >"*-* ^ the bfcsynthes* of tab.oxin and is 

Bacterial and plant biosynthetic arginine decarboxylase fFr a i i iq* /ap^x 

These enzymes are collectively known as group IV decarboxylases [3J 

fhe^i e rr ^ pc , 

Consensus pattern: [«WfteHU^^ 

[ 1 J Bairoch A. Unpublished observations (1 993) 

[1088] 381 . Osteopontin signature 

[ 1] Butler WT. Connect. Tissue Res. 23:123-36(1989) 
[ 2] Gorski J.R Calcif. Tissue Int. 50:391-396(1992). 
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[1076] 375. Orotidine 5*-phosphate decarboxylase active she 

Orotidine 5'-phosphate decarboxylase (EC 4.1.1.23) (OMPdecase) [1 ,2] catalyzes the last step in the de novo biosyn- 
thesis of pyrimidines. the decarboxylation of OMP into UMR In higher eukaryotes OMPdecase is part, with orotate- 
phosphoribosyltransf erase, of a bit unctional enzyme, while the prokaryotic and fungal OMPdecases are monof uncttonal 
protein. Some parts of the sequence of OMPdecase are well conserved across species. The best conserved regbn is 
located in the N-termina! half of OMPdecases and is centered around a lysine residue which is essential for the catalytic 
function of the enzyme. This region has been developed as a signature pattern. 

[1077] Consensus pattern: [UVMFTA]-[LIVMFT-x-D-x-K-x(2)-D-HGPJ.x-T-[LIVMTA] [K is the active site residue]- 

[ 1] Jacquet M., GuilbaudR., Garreau H. MoL Gen. Genet. 211:441-445(1988). 
[ 2] Kimsey H.H., Kaiser D. J. Biol. Chem. 267:819-824(1992). 

[1 078] 376. ATP synthase delta (OSCP) subunit signature 

ATP synthase (proton-translocating ATPase) (EC 3.6.1.34) [1,2] is a component of the cytoplasmic membrane of eu- 
bacteria. the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATPase complex is 
composed of an oligomeric transmembrane sector, called CF(0), which acts as a proton channel, and a catalytic core, 
termed coupling factor CF(1). 

One of the subunits of the ATPase complex, known as subunit delta in bacteria and chloroplasts or the Oligomycin 
Sensitivity Conferral Protein (OSCP) in mitochondria, seems to be part of the stalk that links CF(0) to CF(1). It either 
transmits conformational changes from CF(0) into CF(1) or is involved in proton conduction [3]. 
The different delta/OSCP subunits are proteins of approximately 200 amino-acid residues - once the transit peptide 
has been removed in the chloroplast and mitochondrial forms - which show only moderate sequence homology. 
The signature pattern used to detect ATPase delta/OSCP subunits is based on a conserved region in the C-terminal 
section of these proteins. 

[1 079] Consensus pattern: [LIVM]-x-[LIVMFYT]-x(3)-[Ll\^T]-[DENQK]-x(2)-[LIVM]-x-[GSA]-G-fLIVMFYGAl-x- 
[LIVM]-[KRHENQ]-x-[GSEN] ' 

[ 1] Futai M., Noumi T, Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
[ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3] Engelbrecht S., Junge W. Biochim. Biophys. Acta 1015:379-390(1990). 
[1080] 377. Aspartate and ornithine carbamoyltransf erases signature 

Aspartate carbamoyltransferase (EC 2.1.3.2) (ATCase) catalyzes the conversion of aspartate and carbamoyl phos- 
phate to carbamoylaspartate, the second step in the de novo biosynthesis of pyrimidine nucleotides [1 ]. In prokaryotes 
ATCase consists of two subunits: a catalytic chain (gene pyrB) and a regulatory chain (gene pyrl), while in eukaryotes 
it is a domain in a multi-functional enzyme (called URA2 in yeast, rudimentary in Drosophila, and CAD in mammals 
[2]) that also catalyzes other steps of the biosynthesis of pyrimidines. 

Ornithine carbamoyltransferase (EC 2.1 .3.3) (OTCase) catalyzes the conversion of ornithine and carbamoyl phosphate 
to citrulline. In mammals this enzyme participates in the urea cycle [3] and is located in the mitochondrial matrix. In 
prokaryotes and eukaryotic microorganisms it is involved in the biosynthesis of arginine. In some bacterial species it 
is also involved in the degradation of arginine [4] (the arginine deaminase pathway). 

it has been shown [5] that these two enzymes are evolutionary related. The predicted secondary structure of both 
enzymes are similar and there are some regions of sequence similarities. One of these regions includes three residues 
which have been shown, by crystaliographic studies [6], to be implicated in binding the phosphoryl group of carbamoyl 
phosphate. 

This region was selected as a signature for these enzymes. 

Consensus pattern: F-x-[EKJ-x-S-[GT|-R-T[S, R, and the 2nd T bind carbamoyl phosphate] 

-Note: the residue in position 3 of the pattern allows to distinguish between an ATCase (Glu) and an OTCase (Lys). 

[ 1] Lerner C.G., Switzer R.L J. Biol. Chem. 261:11156-11165(1986). 

[ 2] Davidson J.N., Chen K.C., Jamison R.S., Musmanno LA., Kern C.B. BioEssays 15:157-164(1993). 

[ 3] Takiguchi M., Matsubasa T, Amaya Y, Mori M. BioEssays 10:163-166(1989). 

[ 4] Baur H., Stalon V., Falmagne P., Luethi E., Haas D. Eur J. Biochem. 166:111-117(1987). 

[ 5] Houghton J.E., Bencini D.A.. O'Donovan G.A., Wild J.R. Proc. Natl. Acad. ScL U.S.A. 81:4864-4868(1981). 

[ 6) Ke H.-M., Honzatko R.B., Lipscomb W.N. Proc. Natl. Acad Sci. U.S.A. 81:4037-4040(1984). 

[1081] 378. Oleosins signature 

Oleosins |1J are the proteinaceous components of plants' lipid storage bodies called oil bodies. Oil bodies are small 
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[1066] 371. NrfU-like domain 

S3 STS * ^ *"* c ' T= « 55S5P 48 unkn ° wn iii 

[1069| 372. Nrtrilases / cyanide hydratase signatures 

A conserved cysteine has been ^t^^Z^S^^^*^ * "* h0m0ne ^^acetic acid. 

atta*konthe^itri.ecarb^ 

<-9U is used toavok^e toxic e^ 

[1070] Consensus pattern: e-x(2 H LIVMFY](2)-x-IIF]-x-E-x(2HLIVM h x-G-?-P 
Consensus pattern: ^ GA Q h x (2 )-C- W E^ 

! p! ISSn? I!" IT' I! Na9aSaWa T - Yamada H Pr0C NatL ^ Sci - USA 90-247-251(1 993> 
[1071] 373. NusB family 

6. EMBO J 1 99S;1 7:4092-410ft Peteranderl R, Berglechner F, RichterG. BacherA. Kessler H.Gemmecker 

[E 374 j Neur Chan > Neurotransmitter-gated ion^hannels signature 

binding o, a ^fic ™^ **» • ** channe, upon the 

receptorsare known:-The nicotinic acetylcholine mtS^^^ - ,"^^^- 9 '^ 
of vertebrates, it is composed of four different siimrtS^ k , * Chann0 ' ln me "*** ^plates 

ometry of 2:1:1:1. | n neu^Vs thT^R Tlnw •« ^'P^' 3 ' """^ anddelta or «*h a molar stoichi- 

(a.so caned beta). NicoT KU^Z 

channel. The glycine receotor is a nf »nt am 1 ~L '"vertebrates. - The glycine receptor, an inhibitory chloride ion 
aminobutyrfc-S (GAB^ep.or ^!Z^X*°nT " d ^ ^ ^ 

GABA receptor is complex; at LJ to^J^^ZT ^ Wma ^ s,ruc,ure <* 

there are many variants in each 7^1^™ c v T ,0 6X,St (a,pha ' be,a - Samma, and delta) and 

The serotonin 5HT3 i^^X a T^ZIiri 0r . ! *" ^ ^ s «<> - 

^n.Thereareseven^^ 

transduce extracellular signal by activating G proteins whHe shtTi! - 2 T! ( ' HT2, and 5HT4 to 5HT7 > 
when activated causes fast depolarizing S 

Glutamate is the main excitat£ n^^SSTJSSi"! ?JXF*'!^ mm ^^ a ' M 

have been described and are named accoS 2 66 d * fferent ^ of 9 ,utamate rece P'ws 
quisqualate)^, known se^enT* sS^^^ N-methy,-D-as part ate (NMDA^and 

are composed of a large extracellular otw^It^ m ^ ^ "on-channels are structurally related. They 

transmembrane regions ^0^^ **— "V three hydrophobic^ 

hydrophobic reg^lfoundat^C-ter^ "*» - va '° bl ° «"*h. A fourth 

and Gly receptors are clearly evo^Z^^ZT21 ,he AchR " GABA - 5HT3 - 

similarities are either absent or veZeaK ^ JS^TTST SeqU6nC8 Similari,ieS " These sec » uence 
5HT3/Gly receptors, there are ^^^^21^^^*^ r^*' * m * °» 
bond essential to the tertiary structure of the receX a T^ZfT' AchR \ have bee " sh °™ *> '°rm a disulfide 
cysteines are also conserved. Therefor Mhfe rSSs usel- T b6 ' Ween ^ »» d ^'We-bonded 
[1075] Consensus pattern: C* i ^SS^X^S^!^cT , ^ « 
bond)- J ' j *^HI-yj r--x-D-x(3)-C [The two C's are linked by a disulfide 

! 2! iS^J ' M R ' ShUS,6r M ^^mistry 29:11009-11023(1990) 

I 2J Betz H. Neuron 5:383-392(1 990). '' 

J ^ ° in 9 ledlne Myers S.J.. Nicholas R.A. FASEB J. 4:2632-2645(1 9901 
[ 4] Barnard E.A. Trends Biochem. Sci. 17:368-374(1992). ** 5{W90) 
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[ 1) Campbell W.H., Kinghom J.R. Trends Biochem. Sci. 15:315-319(1990). 
[ 2] Crane B.R.. Siegel L.M., Getzoff E.D. Science 270: 59-67H 995V 

[ 3) Gisselmann G., Klausmeier P., Schwenn J.D. Bicchim. Biophys. Acta 1144:102-106(1993). 
[4] Huang C.J., Barrett E.L J. Bacterid. 173:1544-1553(1991). 

[1055] 367. (NMT) My ristoyl-CoA: protein N-rnyristoyltransf erase signatures. Myristoyi-CoA: 

protein N-myristoyitransferase (EC 2.3.1.97) (Nmt) [1] is the enzyme responsible for transferring a myristate group on 

the N-terminal glycine of a number of cellular eukaryotic and viral proteins. Nmt is a monomeric protein of about 50 to 

60 Kd whose sequence appears to be well conserved. Two highly conserved regions have been developed as signature 

patterns. The first one is located in the central section, the second in the C-terminai part. 

[1056] Consensus pattern: E-l-N-F-L-C-x-H-K- 

Consensus pattern: K-F-G-x-G-D-G- 

[1057] [ 1] Rudnick D.A., McWherter C.A., Gokel G.W., Gordon J.I. Adv. Enzymol. 67:375-430(1993). 
[1058] 368. ADP-glucose pyrophosphorylase signatures (NTPjransf erase) 

[1 059] ADP-glucose pyrophosphorylase (glucose-1 -phosphate adenylyltransf erase) [ 1 ,2](EC 2.7.7.27) catalyzes a 
very important step in the biosynthesis of alpha 1,4-glucans (glycogen or starch) in bacteria and plants: synthesis of 
the activated glucosyl donor, ADP-glucose, from glucose-1 -phosphate and ATP. ADP-glucose pyrophosphorylase is a 
tetrameric allosterically regulated enzyme. It is a homotetramer in bacteria while in plant chloroplasts and amyloptasts, 
it is a heterotetramer of two different, yet evolutionary related, subunits. There are a number of conserved regions in 
the sequence of bacterial and plant ADP-glucose pyrophosphorylase subunits. Three of these regions were selected 
as signature patterns. The first two are N-terminal and have been proposed to be part of the allosteric and/or substrate- 
binding sites in the Escherichia coli enzyme (gene glgC). The third pattern corresponds to a conserved region in the 
central part of the enzymes. 

[1 060] Consensus pattern: [AG]-G-G-x-G-[STK]-x-L-x(2)-L-[TA]-x(3)-A-x-P-A-[LV]- 

Consensus pattern: W-[F^-x-G-[ST>A-[DNSH]-[AS]-(LIVMFYW}- 

Consensus pattern: [APV]-[GS]-M-G-[LIVMN]-Y-|IVCHUVMFYJ-x(2)-[DENPHK]- 

[ 1 J Nakata P.A., Greene T.W., Anderson J.M., Smith-White B. J., Okita T.W., Preiss J. Plant Mol Biol 171089-1093 
(1991). 

( 2] Preiss J., Ball K., Hutney J., Smith-White B.J., Li. L, Okitsa T.W. Pure Appl. Chem. 63:535-544(1991). 
[1 061 ] 369. Sodium/hydrogen exchanger family 

[1062] Na/H antiporters are key transporters in maintaining the pH of actively metabolizing cells. The molecular 
mechanisms of antiport are unclear. 

These antiporters contain 10-12 transmembrane regions (M) at the amino-terminus and a large cytoplasmic region at 
the carboxyl terminus. The transmembrane regions M3-M12 share identity with other members of the family. The M6 
and M7 regions are highly conserved. Thus, this is thought to be the region that is involved in the transport of sodium 
and hydrogen ions. The cytoplasmic region has little similarity throughout the family. 

[1063] [1] Dibrov P, Fliegel L; FEBS Lett 1998;424:1-5. [2] Orlowski J, Grinstein S; J Biol Chem 1997;272: 
22373-22376.(3] Numata M, Petrecca K, Lake N, Orlowski J; J Biol Chem 1998;273:6951-6959. 
[1064] 370. Sodium:sulfate symporter family signature (Na_sulph_symp) 

Integral membrane proteins that mediate the intake of a wide variety of molecules with the concomitant uptake of 
sodium ions (sodium symporters) canbe grouped, on the basis of sequence and functional similarities into a number 
of distinct families. One of these families currently consists of the following proteins: - Mammalian sodium/sulfate 
cotransporter [1J. - Mammalian renal sodium/dicarboxylate cotransporter [2], which transports succinate and citrate. - 
Mammalian intestinal sodium/dicarboxylate cotransporter. - Chlamydomonas reinhardtii putative sulfur deprivation re- 
sponse regulator SAC1 (3). - Caenorhabditis elegans hypothetical proteins B0285.6, F31F6.6, K08E5.2 and R107.1. 
- Escherichia coli hypothetical protein ylbS. - Haemophilus influenzae hypothetical protein HI0608. - Synechocystis 
strain PCC 6803 hypothetical protein SII0640. - Methanococcus jannaschii hypothetical protein MJ0672.These trans- 
porters are proteins of from 430 to 620 amino acids which are highly hydrophobic and which probably contain about 
12 transmembrane regions. As a signature pattern, a conserved region was selected which is located in or near the 
penultimate transmembrane region. » 
[1065] Consensus pattern: |STACP]-S-x(2)-F-x(2)-P-[LIVM]-[GSA]-x(3)-N-x-(LIVM)-V- 

[ 1] Markovich D., Forgo J., Stange G. t Biber J., Murer H. Proc. Natl. Acad. Sci. U.S.A. 90:8073-8077(1993) 

[ 2] Pajor A.M. Am. J. Physiol, 270:642-648(1996). 

[ 3] Davies J.P., Yildiz F.H., Grossman A. EMBO J. 15:2150-2159(1996). 
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[ 8] Ju a, Morrow B.E., Warner J R. Mol. Coll. Biol. 10:5226-5234(1990) 
[ 9] Klempnauor K.-H., Sippel A.E. EMBO J. 6:271 9-2725(1987). 

EUJL ^ ^^^^S^^P^Phate dehydrogenase signature 
[K*J Consent M „. m ; aiWWAMWWlln*,^^ 

Pelletier J, Genomics 1997;44:253-265 we.ssman BE, Zabel B, Housman DE t 

12) Schnieders F, Dork T, Amemann J, Vogel T, Warner M, Schmidtke J; Hum Mol Genet 1996;5:1801-1807. 

[1049] 364. NB-ARC domain 

van der Biezen EA, Jones JD> Curr Biol 1998;8:226-227 

[1050] 365. Nucleoside diphosphate kinases active site 

polysaccharide synthesi l^Gw i^Z JZf I "f^ 8cW SyntheS ' S - CTP ,or ««*« s ^*- "TP lor 
karyo.es. mere s£ms?o b^snS. SnS' ^ TSf" ^ miCr ° ,UbU,e P 0 *""***. '» - 
and/or has a distinct l^ Eu^Z^SZ^TT "* " 3 

B)[2]. By random associaHon (MMBAbT^I^ ?7 . L heXamerS * tW> hi9hl * retated (Aand 
point. NDK are prcie^ ^7 Kd'^t a ,a S Sin!!™ h ' 0rm isoenz y mes differin 9 *» ^eir isoelectric 

by transterofthe'term^phospSat ^P^ZV^L^^T^ * T" " 
its phosphate group to any NDF» to producean NTP NDK fel™m« h magnesium, the phosphoenzyme can transfer 
karyotic sources It has also been ^ At t^tT'n ^ ™ «Wcecl from prokaryotic and eu- 
associa.ed NDK. iZ^ESStS 22 2 n^f ^ Z^ 0 ™ 1 discs) protein, is a microtubule- 
highfy conserved through ^^21! TT™ *** ^t^* °' NDK has ^ 

[1052] consensus pattern: N-x(2)-H-[GA]-S-D-{SA]-[LIVMPKNE] ,H is me putatfce active site residue)- 

! n Si** ^•» 9a r, rWa ' R ( ' n) ne EnZymes < 3rtJ ^i" 0 ") 8:307-334(1973) 

3 ' PfeSeCa 2 E - WDniCa LaSCU L J Chem " 266:8784-8789(1991) 

[ 3] B.ggs J., Hersperger E.. Sleeg RS.. Uotta LA. Sheam A Cell 63:933-940MQon| 

2e ^JIL^n^ ~ (NIR - S,R > Nttrte (NiR) ,„ 

NiR: the higher piant *lo£SZ?E^ £?5 ? :^ S,mte,i0n °' ^ ^ are ^ * 
the electron donor; white funga. *^™mi^ 6 V££^ P " *"* redUCOd ,0rredOxin as 
electron donor. Both forms of NiR JSn a skohemell^il h °T menc pro,9,ns •« «»• NAD(P)H as the 
1 B 1 2) fSIR\ m ic th<, h »M . s,roneme - Fe and iron-sulfur centers. Sulfite reductase (NADPhtt rFr 

of the domab, and the two others are grouplTm ^^^^^ZT^"^^ 
of the iron-sulfur center; the .as, one a£> binds the sirohe™ g ou ZT"- * *" bind ' n9 

second cluster of cysteines was derived signaiure pattern from the region around the 

I1054J *»-n«p^prVHW^ 
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- Escherichia coli hypothetical protein ygdU and HI0901 . the corresponding Haemophilus influenzae protein. 

- Escherichia coli hypothetical protein yjaD and HI0432. the corresponding Haemophilus influenzae protein. 
Escherichia coli hypothetical protein yrfE. 

Bacillus subtilis hypothetical protein yqkG. 
Bacillus subtilis hypothetical protein yzgD. 
Yeast hypothetical protein YGL067w. 

[1 040] It is proposed (2) that the conserved domain could be involved in the active center of a family of pyrophosphate- 
releasing NTPases. As a signature pattern the core region of the domain was selected; it contains four conserved 
glutamate residues. 

[1 041] Consensus pattern: G-x(5)-E-x(4HSTAGC J-[LI VMACJ-x-R-E-[LI VMFT]-x-E-E- 

[1] Michaels M.L. Miller J.H. J. Bacterbt. 174:6321-6325(1992). 
[2] Koonin E.V. Nucleic Acids Res. 21:4847-4847(1993). 

[3] Mejean V., Salles C, Bullions M.J., Bessman M.J., Claverys J.-P. Mol. Microbiol. 11:323-330(1994). 

[4] Sakumi K., Furuichi M., Tsuzuki T., Kakuma T., Kawabata S., Maki H., Sekiguchi M J Biol Chem 268* 

23524-23530(1993). 

[5]Thorne N.M.H., HankinS.. Wilkinson M.C., Nunez C., Barrac tough R., McLennan A.G Biochem J 31V717-721 
(1995). 

[1042] 361. Myb DNA-binding domain repeat signatures 

The retroviral oncogene v-myb , and its cellular counterpart c-myb, encodenuclear DNA-binding proteins that specifi- 
cally recognize the sequence YAAC(G/T)G [1J. The myb family also includes the following proteins: - Drosophila D- 
myb [2]. - Vertebrate myb-like proteins A-myb and B-myb [3], - Maize C1 protein, a trans-acting factor which controls 
the expression of genes involved in anthocyanin biosynthesis. - Maize P protein [4], a transacting factor which regulates 
the biosynthetic pathway of a fiavonoid-derrved pigment in certain floral tissues. - Arabidopsis thaliana protein GL1 [5], 
required for the initiation of differentiation of leaf hair cells (trichomes). - A number of myb/cl-related proteins in maize 
and barley, whose roles are not yet known [4]. - Yeast BAS1 [7], a transcriptional activator for the HIS4 gene. - Yeast 
REB1 [8], which recognizes sites within both the enhancer and the promoter of rRNA transcription, as well as upstream 
of many genes transcribed by RNA polymerase II. - Fission yeast cdc5, a possible transcription factor whose activity 
is required for cell cycle progression and growth during G2. - Fission yeast mybl, which regulates telomere length and 
function. - Yeast hypothetical protein YMR21 3w.One of the most conserved regions in all of these proteins is a domain 
of 160amino acids. It consists of three tandem repeats of 51 to 53 amino acids. In myb, this repeat region has been 
shown (9] to be involved in DNA-binding. The major part of the first repeat is missing in retroviral v-myb sequences 
and in plant myb-related proteins. Yeast REB1 differs from the other proteins in this family in having a single myb-like 
domain. As shown in the following schematic representation, two signature patterns for myb-like domains were devel- 
oped; the first is located in the N-terminal section, the second spans the C-terminal extremity of the domain. 

xxxxxxxxxWxxxEDxxxxxxxxxxxxxx WxxIxxxxxxRxxxxxxxxWxxxx ♦♦»♦***** 

. Position of the patterns. 

[1 043] Consensus pattern: W-[ST]-x(2)-E-[DE J-x(2)-[U V]- 
Consensus pattern: W-x(2)-[LIHSAGl-x(4,5)-R-x(8)-[YW}-x(3)-[LIVM]- 

Note: this pattern detects the three copies of the domain in myb, d-myb, A-myb and B-myb; the second of the two 
complete copies of plant myb-related proteins, and the last two copies ot yeast BAS1 

[ 1) Biednkapp H, Borgmeyer U., Sippel A.E., Klempnauer K.-K Nature 335:835-837(1988). 

[ 2) Peters C.W.B., Sippel A.E., Vingron M., Klempnauer K.-H. EMBO J. 6:3085-3090(1987). 

[3] Nomura N., Takahashi M., Matsui M., Ishii S., Date T, Sasamoto S., Ishizaki R. Nucleic Acids Res 16* 

11075-11090(1988). 

[ 4] Grotewold E., Athma P., Peterson T Proc. Natl. Acad. Sci. U.S.A. 88:4587-4591(1991). 

[ 5) Oppenheimer D.G., Herman PL., Sivakumaran S., Esch J., Marks M.D. Cell 67:483-493(1 991V 

( 6] Marocco A, Wissenbach M., Becker D., Paz-Ares J., Saedler H., Salamir.i F.. Rohde W. Mol Gen Genet 216 

183-187(1989). 

[ 7] Tice-Baldwin K., Fink G.R., Amdt K.T Science 246:931-935(1989). 
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[1037] 359. Prokaryoticmolybdopterinoxidoreductases signatures (molybdopterin) 

reductase (EC 1.7.99.4V ThL e n ^^^^ " ***** ^ nftra, ° 

obic growth. ThT^e is cor^^oS^ 

is the molybdopterin-binding subunft. eSw^SS^, ^ ' . "? ^ a * ta chain <9 ene «*> 

which aiso contains a molyLoptet t^oZT^ZZTT' k ^ ^ 

reductase (DMSO reductase). DMSO !S2^SZ*S^5?^ coli anaerobic dimethyl suifoxide 

-^"ecompo^^^^ 

molybdoptem. - Escherichia coK biotin sulfoxide reductases (genTs TisC^d ttsa TOs iSH^ ** 
neous oxidation product of biotin BDS back to hiohn n « ^' en zy™ reduces a sponta- 

sutfoxtfe , as a hiotin source. - J^SSS^S! SSiS S T ^ 

dioxide. In addrt^ ,o mo.yLdop,2in l S 

b.nds molybdopterin. - Salmonella typhimurium thiosu>lXeductareTQene Ifei ^ ^ ^ ^ PS ' A) 
N-oxide reductase (EC 1 6 6 9) (oene torA) rat iMit ra .» , J / P ' cSChenchla «* trimethylamine- 

nasA, A,ca.i 9e nes eutro^E^^^ K,ebsiel,a <9ene 

Synechococcus PCC 7942 (gene narB) These ^ oroSns £ ^ ' Th '° Sphaera P^otropha (gene napA), and 
ins,e. Three signature pattLforthis?^^^ 

terminal section and contains two cysteine residim* nnrh^. ; , ? a conserved re 9K>n in the N- 

be noted tha, this region is deS? ""N**™ '« 

central part of these enzymes PaBem 18 deWed ' r0m a conserved ^ion located in the 

£55- Consensus pattem: [S ™ ] - x -' CH ^^^ 

Consensus pattern: [STAhx-[STAC](2)-x(2)-[STA].D- f LIVMY](2K-P-x-[STAClf2)-xf2» P 

iTSSl^SST > C0CK J M " ° E ■ *** W.A.. Bray R.C. Bfcchim. Biophy, 

[ 2] Bitous RT Cole S T.. Anderson W.F., Weiner J.H. Mol. Microbiol 2785-795(19881 
[4,Me J ean V ..Lobb l -N,o l C.,^^ 

[1 039J 360. Bacterial mutT domain signature 

opposite to dA and dC residues of templa^rSS^ 
MutT specif ica.V degrades 8-oxo^gTp t the m^o^^^^^ 

is a small protein of about 12 to 15 Mn^^SSS^T, °' MutT 

is found in the N-terminal part of mutT can also to founrtTi 1 ? f °' ab0Ut 40 amin0 acid residues - 
These proteins are: 66 '° Und 3 Va " e,y of other P^^tic, viral, and eukaryotic proteins. 

Streptomyces pneumoniae mutX 

" ^ n°T^ U ° m PlaSmid PSAM2 ° f Stre P^yces ambofaciens. 

- Bartonella bacilliformis invasion protein A (gene invA) 

- Escherichia coli dATP pyrophosphohydrolase. 

- Protein D250 from African swine fever viruses 
Proteins D9 and D10 from a variety of poxviruses 

Mammalian 7,8-dinydro-S-oxoguanine triphosphatase (EC 3 1 6 -) [41 

Escherichia coli hypothetical protein yfaO. 
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[ 11 Woessner J. Jr. FASEB J. 5:2145-2154(1991). 

[ 2] Sanchez-Lopez ft, Nicholson R. t Gesnel M.C., Matrisian LM.. Breathnach R. J. Biol. Chem 263*11892-11899 
(1988). 

[ 3] Park A.J., Matrisian LM., Kells A.F., Pearson R., Yuan 2.. Navre M. J. Biol. Chem. 266:1584-1590(1991) 
[ 4] Lepage T., Gache C. EMBO J. 9:3003-3012(1990). 

[ SJKinoshitaT., Fukuzawa K, ShimadaT., SaitoT., Matsuda Y. Proc. Natl. Acad. Sci. U.S.A. 89:4693-4697(1992). 
[1033] 357. Vertebrate metallothioneins signature (metalthio) 

Metallothioneins (MT) [1 ,2,3] are small proteins which bind heavy metals such as zinc, copper, cadmium, nickel, etc., 
through clusters of thiolate bonds. MTs occur throughout the animal kingdom and are also found in higher plants, fungi 
and some prokaryotes. On the basis of structural relationships MTs have been subdivided into three classes. Class I 
includes mammalian MPs as well as MTs from crustacean and molluscs, but with clearly related primary structure. 
Class II groups together MTs from various species such as sea urchins, fungi, insects and cyanobacteria which display 
none or only very distant correspondence to class I MTs. Class III MTs are atypical polypeptides containing gamma- 
glutamylcysteinyl units. Vertebrate class I MTs are proteins of 60 to 68 amino acid residues, 20 of these residues are 
cysteines that bind to 7 bivalent metal ions. As a signature pattern a region that spans 1 9 residues and which contains 
seven of the metal-binding cysteines was chosen, this region is located in the N-terminal section of class-l MTs. 
[1034] Consensus pattern: C-x-C-[GSTAP]-x(2)-C-x-C-x(2)-C-x-C-x(2)-C-x-K- 



[ 1] Hamer D.H. Annu. Rev. Biochem. 55:913-951(1986). 

[ 2] Kagi J.H.R., Schaffer A. Biochemistry 27:8509-8515(1988). 

[ 3] Binz P. -A. Thesis, 1996, University of Zurich. 



[1035] 358. Mitochondrial energy transfer proteins signature (mito_carr) 

Different types of substrate carrier proteins involved in energy transfer are found in the inner mitochondrial membrane 
[1 to 5]. These are: - The ADRATP carrier protein (AAC) (ADP/ATP translocase) which exports ATP into the cytosol 
and imports ADP into the mitochondrial matrix. The sequence of AAC has been obtained from various mammalian, 
plant and fungal species. - The 2-oxoglutarate/malate carrier protein (OGCP), which exports 2-oxoglutarate into the 
cytosol and imports malate or other dicarboxylic acids into the mitochondrial matrix. This protein plays an important 
role in several metabolic processes such as the malate/aspartate and the oxoglutarate/isocitrate shuttles. - The phos- 
phate carrier protein, which transports phosphate groups from the cytosol into the mitochondrial matrix. - The brown 
fat uncoupling protein (UCP) which dissipates oxidative energy into heat by transporting protons from the cytosol into 
the mitochondrial matrix. - The tricarboxylate transport protein (or citrate transport protein) which is involved in citrate- 
H+/malate exchange. It is important for the bioenergetics of hepatic cells as it provides a carbon source for fatty acid 
and sterol biosyntheses, and NAD for the glycolytic pathway - The Grave's disease carrier protein (GDC), a protein 
of unknown function recognized by IgG in patients with active Grave's disease. - Yeast mitochondrial proteins MRS 3 
and MRS4. The exact function of these proteins is not known. They suppress a mitochondrial splice defect in the first 
intron of the COB gene and may act as carriers, exerting their suppressor activity by modulating solute concentrations 
in the mitochondrion. - Yeast mitochondrial FAD carrier protein (gene FLX1), - Yeast protein ACR1 [6], which seems 
essential for acetyl-CoA synthetase activity. - Yeast protein PETS. - Yeast protein PMT - Yeast protein RIM2. - Yeast 
protein YHM1/SHM1 . - Yeast protein YMC1 . - Yeast protein YMC2. - Yeast hypothetical proteins YBR291c, YEL006w, 
YER053C, YFR045w, YHR002w, and YIL006w. - Caenorhabditis elegans hypothetical protein K11H3.3.Twoother pro^ 
teins have been found to belong to this family, yet are not localized in the mitochondrial inner membrane: - Maize 
amy lopiast Brittle-1 protein. This protein, found in the endosperm of kernels, could play a role in amyloplast membrane 
transport. - Candida boidinii peroxisomal membrane protein PMP47 [7]. PMP47 is an integral membrane protein of the 
peroxisome and it may play a role as a transporter. These proteins all seem to be evolutionary related. Structurally, 
they consistof three tandem repeats of a domain of approximately one hundred residues. Each of these domains 
contains two transmembrane regions. As a signature pattern, one of the most conserved regions in the repeated domain 
was selected, located just after the first transmembrane region. 
[1036] Consensus pattern: P-x-[DE]-x-[LIVAT]-[RK]-x-[LRHHUVMFY]-[QGAIVM]- 

[ 1) Klingenberg M. Trends Biochem. Sci. 15:108-112(1990). 

[ 2) Walker J.E. Curr. Opin. Struct. Biol. 2:519-526(1992). 

[ 3] Kuan J., Saier M.H. Jr. CRC Crit Rev. Biochem. 28:209-233(1993). 

[ 4) Kuan J. ( Saier M.H. Jr. Res. Microbiol. 144:671-672(1993). 

( 5] Nelson D.R., Lawson J.E.. Klingenberg M., Douglas M.G. J. Mol. Biol. 230:1159-1170(1993). 
[ 6] Palmieri F. FEBS Lett 346:48-54(1994). 

[ 7] Jank B., Habermann B., Schweyen R.J., Link T.A. Trends Biochem. Sci. 18:427-428(1993). 
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Kita M, Hisada S, Endc-.nagaki T, OmZTSZ^^S""* " 9 d ™ ta '"»* I**-* T 
[1023] 354. MAGE family ' 

cellular function of this family is unknovT P ^ ^ 061,8 ° f ,he ***** embryo. The 

Kl J> Wa " 9 * Ya " 9 HM " ^ N ' F ' P- 0- Mo, Gene, 

E<*^^^ ^ ox«at,e d*=arbox- 

enzyme [1,2.3]: - NA^pendenT^ 

decarboxylate oxaloacetate (OAA) It !• to^iS£^ . uses preferentially NAD and has the ability to 
which uses preferential JK£ J k ISSSSTS^h NA ^ e P. ende "' enzyme (EC l.l.lSn . 
and isaheterodimer of highly relatedsubunrts N SS^«? r " " ^ mitochondfial n****^ 
forNADPandhastheabillh, oH^^Zl^ "!^™^««^(KjJJ^ 1 ^ ha , a preference 
there are two isozymes: one, m*ei™X^ »n-amma te> 
cytosotic. There are two other proteins which are closetv s r^^'I ,^ f ^ ^ ,sozvrnes: chloroplastic and 
sfcA. whose function is not ye'known but St^^ 

thefca. protein YKL029c, a probable ma.ic enzyme. There are ^1^^ T" 0 ?*>"»• " Yeast ^P 0 " 
Two of them seem to be involved in binding NAD or NADP Th^EJ? ~««* "W n the enzyme sequences, 
of the enzymes, is not yet known Th s Tn^f h ? ■ s '9 nrf,cance °» »» «rd one. located in the central part 

11027] Consensus SSSSKS?? tSlK Si9na,U ' e ^ '°' *~ «W 
[1028, [ 1, Artus N.!,. 

K-ystek E., Dworkin M.B. J. Biol. Chem. 266:301 6-3cSn 991 W 3 S„ I £ I n ' ant9 KA « Maws ^ 
269:2827-2833(1994). ^1(1991). [ 3] Long J.J., Wang J.-L, Berry J O. J. Biol. Chem. 

[1029] 356. (matrixin) 

Matrixins cysteine switch (aka peptidase_M1 0) 

[1030] Mammalian extracellular matrix metalloproteinases <EC 3dp/ , alt ,, 

<PDOC00129>), are zinc-dependent enzvmes J^vZ^L^l \ 1 80 knoWn as ma,rixins W 
from the mature enzyme by the presence of an N term Z!^T Si T " tonn (2 * mo 0 Bn > ™ 

residues downstream of th C-SaTend I o ti ^0^^^ ^ A is tou " d «*° 

bition of matins [2,3]; a cyste^e J^S^SS^S^T ^ * h aU,0inhi " 

^ region ^ been calIed tne ^ £ SSSSTJ^ "» ^ th ° e -V™- 
[1031] A cysteine switch has been found in the following zinc proteases: 

- MMP-1 (EC 3.4.24.7) (interstitial collagenase) 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase) 

- MMP-3 (EC 3.4.24.17) (stromelysin-1) 

- MMP-7 (EC 3.4.24.23) (matrifysin). 

- MMP-8 (EC 3.4.24.34) (neutrophil collagenase) 

- MMP-9 (EC 3.4.24.35) (92 Kd gelatinase) 

- MMP-10 (EC 3.4.24.22) (stromelysin-2). 

- MMP-11 (EC 3.4.24.-) (stromelysin-3) 

" k«T 12 (EC 3A 24 6S ) (macrophage metalloelastase) 

- MMP-1 3 (EC 3.4.24.-) (collagenase 3) 

" Ump It %n (membrane -'yP« matrix metalliproteinase 1). 

MMP « S L o 4 " } < mombran ^ metaHiproteinase 2 . 
MMP-16 (EC 3.4.24,) (membrane-type matrix metalliproteinase 3 

- Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) [4] 

- Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5] 
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[0994] 343. Mandefate racemase / muconate lactonizing enzyme family signatures 

Mandeiate racemase (EC 5.1.2.2 ) (MR) and muconate lactonizing enzyme(EC 5.5.1. 1 ) (MLE) are two bacterial en- 
zymes involved in aromatic acid catabolism. They catalyze mechanistically distinct reactions yet they are related at 
the level of their primary, quaternary (homooctamer) and tertiary structures [1 ,2].A number of other proteins also seem 
to be evolutionary related to these two enzymes. These are: - The various plasm id-encoded chloromuconate cyclois- 
omerases (EC 5.5.1.7). - Escherichia coli protein rspA [3], rspA seems to be involved in the degradation of homoserine 
lactone (HSL) or of one of its metabolite. - Escherichia coli hypothetical protein ycjG. - Escherichia coli hypothetical 
protein yidU. - A hypothetical protein from Streptomyces ambofaciens (4). Two signature patterns have been developed 
for these enzymes; both contain conserved acidic residues. 

[0995] The second pattern contains an aspartate and a glutamate which are ligands for either a magnesium ion (in 
MR) or a manganese ion (inMLE). 

[0996] Consensus pattern: A-x-[SAGCN]-{SAG]4UVM]-(DEQ]-x.A-[LA]-x-[DEHUA]-x-[GA]4KRQ].x(4)-|PSAl. 
[LIV]-x(2)-L-[LIVMF]-G- J 

Consensus pattern: [LIVF]-x(2)-D.x-[NHhx(7)-(ACL].x(6)-[UVMF]-x(7)-[LIVMh E-fDENQJ-P [D and E bind a divalent 
is metal ion]- 

( 1] Neidhart D.J., Kenyon G.L, Gerlt J.A., Petsko G.A. Nature 347:692-694(1990). 

[2] Petsko G.A., Kenyon G.L, Gerlt J.A., Ringe D., Kozarich J.W. Trends Biochem. Sci. 18:372-376(1993). 
[ 3] Huisman G.W., Kolter R. Science 265:537-539(1994). 

[ 4] Schneider D., Aigle B., Leblond P., Simonet J.M., Decaris B. J. Gen. Microbiol. 139:2559-2567(1993). 
[0997] 344. Merozoite Surface Antigen 2 (MSA-2) family 

[0998] Thomas AW, Carr DA, Carter JM, Lyon JA, Mol Biochem Parasrtol 1990;43:211-220. 
[0999] 345. MSP (Major sperm protein) domain. 

[1000] Major sperm proteins are involved in sperm motility. These proteins oligomerise to form filaments. Partial 
matches to this domain are also found in other non MSP proteins. These include Swiss:P40075 and Swiss:P34593 
[1001] [1] Bullock TL, Roberts TM, Stewart M, J Mol Biol 1996;263:284-296. [2] King KL, Stewart M Roberts TM 
Seavy M, J Cell Sci 1 992; 1 0 1 :847-857. 

[1002] 346. (Matrix) Viral matrix protein. Found in Morbillivirus and paramyxovirus, pneumovirus Number of mem- 
30 bers: 105 

[1003] 347. O-methyltransf erase (methyltransf) 

[1 004] This family includes a range of O-methyltransf erases. These enzymes utilise S-adenosyl methionine. 
[1 005] [1 ] Keller NP, Dischinger HC, Bhatnagar D. Cleveland TE, Ullah AH, Appl Environ Microbiol 1 993-59-479-484 
[1006] 348. Magnesium chelatase, subunit Chll 

[1 007] Magnesium-chelatase is a three-component enzyme that catalyses the insertion of Mg2+ into protoporphyrin 
IX. This is the first unique step in the synthesis of (bacterio)chlorophyll. Due to this, it is thought that Mg^helatase has 
an important role in channeling inter- mediates into the (bacterio)chlorophyll branch in response to conditions suitable 
for photosynthetic growth. Chll and BchD have molecular weight between 38-42 kDa. 

[1008] [1] Walker CJ, Willows RD, Biochem J 1997;327:321-333. [2] Petersen BL, Jensen PE, Gibson LC, Stummann 
BM, Hunter CN, Henningsen KW, J Bacteriol 1998;180:699-704. 
[1009] 349. Piasmid recombination enzyme (Mob_Pre) 

[1010] With some plasmids, recombination can occur in a site specific manner that is independent of RecA. In such 
cases, the recombination event requires another protein called Pre. Pre is a piasmid recombination enzyme. This 
protein is: also known as Mob (conjugative mobilization). 
[1011] [1] Priebe SD, Lacks SA, J Bacteriol 1989;171:4778-4784. 
[1012] 350. Monooxygenase 

[1013] This family includes diverse enzymes that utilise FAD. 

[1014] [1] Gatti DL, Palfey BA, Lah MS, Entsch B. Massey V, Ballou DP, Ludwig ML, Science 1994266" 110-114 
[1015] 351. Mov34 family 

[1016] Members of this family are found in proteasome regulatory subunits, eukaryotic initiation factor 3 (elF3) sub- 
units and regulators of transcription factors. 

[1017] [1) Aravind L, Ponting CP, Protein Sci 1998;7:1250-1254. [2] Hershey JW, Asano K, Naranda T, Vornlocher 
HP, Hanachi P. Merrick WC. Biochimie 1996;78:903-907. 
[1018] 352. Myc amino-terminal region (Myc_N_term) 

[1019] The myc family belongs to the basic helix-loop-helix leucine zipper class of transcription factors, see HLH. 
Myc forms a heterodimer with Max, and this complex regulates celt growth through direct activation of genes involved 
in cell replication [2]. 

[1020] [1] Facchini LM, Penn LZ, FASEB J 1998;12:633-651. [2] Grandori C. Eisenman RN, Trends Biochem Sci 
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'^isalsopresentinarchebacter^ f CDC4? " Pfdtfera < in A-thalanaJ.This 

and MJECL13.The presence of a JZ*to?aZ£^£ ? erearefourmembers: MJ0363.MJ0961. MJ1489 
consumhg step in tEe initiation d DNA ^ZS!^^?* ?~ ^ ^ * " ^' 3 * 

[0989] Consensus pattern: G-flVT)-[LVAC](2HIVn-D-fDEHFL]-[DNSTJ 

f 2! ST r " ^"o *" K - KeafSey S E - Nuc,eic ^ Res. 20:5571-5577(1992) 

isi^-K^CeTS^ 

[ 4] Koonm E.V. Nucleic Acids Res. 21:2541-2547(1993). 

[0990] 341. Macrophage migration inhibitory factor family sianature rMin 
Aprotemcalledrnacrophagemig^^^ 

responses. It play a pivotal role in the host response to enttollt th^ T 10,9 host ^mmatory 

hormone that regulates systemic inf.amma.ort- TlS°- 1 2?, T" l ° ™ n 38 3 pi,urtar y "s^ 8 ' 
essedfromalarger precursor. &*pto^S:^ 
biosynthesisandthattautomerizesDKiopac^Jro^^ 

It is a protein of 117 residues highV reS toTpTrn^T . J f!f art,OXy,a,i0nto9ive5 - 6 - t,in VcJroxyindole (DHI) 
be reiated to glutathbne S-transVeras^ "** *«*™ — haTbeen said o 

proteins, a conserved reg*n was ~JZ£tZ^J£^ 3 » * — 

[0991] Consensus pattern: [DEhP-C-A-xt^-fLIVMl-x-S-l-G-x-fUVlJlS- 
[ 1] Bucala R. Immunol. Lett. 4323-26(1 994) 

IvS^^*"* 1 *" ^ ""■"»"» Rorsman H. Bfcchem. Bbphys. R e , Commu, 197: 

1 3] Pearson W.R Protein Sci. 3:525-527(1994). 

[0992J 342. MIP family signature 

.s the major component of lens fiber gap junctions Gao unS£^»H : " f Mammal,an ™** P^tein (MIP). MIP 

from one ce.. to another. - M-nrn^J^^^JS^^ ^change of ions and smaH molecule 
plasma membranes of red cells and kidney ^S^ZS^^T W T: Specific channels provide the 
permitting water to move in the direction of ™^Tm^T*TT h ' 9h permeabi,i, y *> ™>* thereby 
peribacteroidmembraneinducedduringToduS^teoum^ 'J*^ a component of the 

proteins (TIP). There are various iso"orrn^of TIP- alpha ^see^nlf 6 ' ^*' urn '"'action. - Plants tonoplast intrinsic 
Theseproteinsmay allowthe diffusion ofTa.e aS^S*^!!* ? ^ "* WSi < wat «^ess induced). 
- Bacteria, g,ycero. facimator protein (ge^g pFT S^ 

membrane. - SalmoneHa typNmuriJ propan^lXsS S ^ Ce '° > aCr0SS me Wasmic 

amux .aci.ita.or protein. - DrosophHa ^Tpr^^lT^Z^ ' *"* ^ 3 
mun.cat.on; it may .unctions by allowing me transoort «J r«ZT P may media,e mi ^ c ^' com- 

exodermal ce.l to become an epidermobtes, instead a ne^ J v "* Send "9 a si 9" al «* an 

thetica. protein trom the pepX region of hSSUl ^^TJ^*™ ^ YFL ° 54c - " A 'VP- 
segments. Computer analysis shows that tSseZeS orrZT k P ^ 86601 ,0 ocnlain six ^membrane 
an ancestra. protein that contained tZ IlZESEJS^,* * dUp ' iCafon eve "« «™ 

was se.ec.ed whk* is .oca.ed in a P^1JSS^?KL^ T ^ 8 We " COnServed fe 9™ 
[0993] Consensus pattern: [HNQA]-x-N-P-[ST^]^UVM ,hird lra ™rane region, 

f 2 StoM p R S 6r u u a,er MK J ' CRC Cr " ReV - Bioche ^- 28:235-257(1993) 
2 Baker M.E.. Sa.er M.H. Jr. Cell60:l85-I8fif iggn) *>'nw«). 

IffiiS,^ K ° ■ H ° efte H - Chri8peete ^ ^ee, G., Sanda, N.N.. Saier M H Jr Mol 
isic^M^^^^^^^ 
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[ 6) Flower D.R., North A.C.T, Attwood T.K. Protein ScL 2:753-761(1993). 
[ 7) Flower D.R. FEBS Lett. 333:99-102(1993). 

[0984] 338. Lipoxygenases iron-binding region signatures 

Lipoxygenases (EC 1.13.11.-) are a class of iron-containing dioxygenases which catalyzes the hydroperoxidation of 
lipids, containing a cis.cis-1 ,4-pentadiene structure. They are common in plants where they may be involved in a number 
of diverse aspects of plant physiology including growth and development, pest resistance, and senescence or respons- 
es to wounding [1 ]. In mammals a number of lipoxygenases isozymes are involved in the metabolism of prostaglandins 
and leukotrienes [2]. Sequence data is available for the following lipoxygenases: - Plant lipoxygenases (EC 1 1 3. 1 1 1 2) 
Plants express a variety of cytosolic isozymes as well as what seems [3] to be a chtoroplast isozyme. - Mammalian 
arachidonate 5-lipoxygenase (EC 1.13.11.34) . - Mammalian arachidonate 1 2-lipoxygenase (EC 1.13.11.31) , - Mam- 
malian erythroid celt-specific 1 5-lipoxygenase (EC 1.13.11.33 ).The iron atom in lipoxygenases is bound by four ligands, 
three of which are histidine residues [4]. Six histidines are conserved in all lipoxygenase sequences, five of them are 
found clustered in a stretch of 40 amino acids. This region contains two of the three zinc-ligands; the other histidines 
have been shown [5] to be important for the activity of lipoxygenases. As signatures for this family of enzymes two 
patterns in the region of the histidine cluster were selected. The first pattern contains the first three conserved histidines 
and the second pattern includes the fourth and the fifth. 

Consensus pattern: H-[EQ]-x(3)-H-x-[LM]-[NQRC]-[GST]-H-[LIVMSTAC](3)-E [The second and third H's bind ironl- 
[0985] Consensus pattern: [LIVMA]-H-P-[LIVM]-x-[KRQ]-[LIVMF](2)-x-[AP]-H- 

[ 1) Vick B.A., Zimmerman D.C. (In) Biochemistry of plants: A comprehensive treatise, Stumpf RK , Ed Vol 9 
pp.53-90, Academic Press, New-York, (1987). 

[ 2] Needleman P., Turk J., Jakschik B.A.. Morrison A.FL, Lefkowith J.B. Annu. Rev. Biochem. 55:69-102(1986) 
[ 3] Peng Y.L, Shirano Y, Ohta H., HibinoT, Tanaka K., Shibata D. J. Biol. Chem. 269:3755-3761(1994). 
[ 4] Boyington J.C., Gaflney B.J., Amzel LM. Science 260:1482-1486(1993). 

[ 5] Steczko J., Donoho G.P., Clemens J.C., Dixon J.E., Axelrod B. Biochemistry 31:4053-4057(1992). 
[0986] 339. Fumarate lyases signature (lyase_1 ) 

A number of enzymes, belonging to the lyase class, for which fumarate is a substrate have been shown [1 ,2] to share 
a short conserved sequence around a methionine which is probably involved in the catalytic activity of this type of 
enzymes. These enzymes are: - Fumarase (EC 4.2. 1.2 ) (fumarate hydratase), which catalyzes the reversible hydration 
of fumarate to L-malate. There seem to be 2 classes of fumarases: class I are thermolabile dimeric enzymes (as for 
example: Escherichia coli fumC); class II enzymes are thermostable and tetrameric and are found in prokaryotes (as 
for example: Escherichia coli f umA and f umB) as well as in eukaryotes. The sequence of the two classes of fumarases 
are not closely related. - Aspartate ammonia-lyase (EC 4.3.1.1 ) (aspartase), which catalyzes the reversible conversion 
of aspartate to fumarate and ammonia. This reaction is analogous to that catalyzed by fumarase, except that ammonia 
rather than water is involved in the trans-elimination reaction. - Arginosuccinase (EC 4.3.2.1) (argininosuccinate lyase) 
which catalyzes the formation of arginine and fumarate from argininosuccinate, the last step in the biosynthesis of 
argmme. - Adenytosuccinase (EC 4.3.2.2) (adenylosuccinate lyase) [3], which cataVzes the eight step in the de novo 
biosynthesis of purines, the formation of S'-phosphoribosyl-S-amino^-imidazolecarboxamide and fumarate from 1- 
(5-phosphoribosyl)-4-(N-succino-carboxamide). That enzyme can also catalyzes the formation of fumarate and AMP 
from adenylosuccinate. - Pseudomonas putida 3-carboxy<is,cis-muconate cycloisomerase (EC 5.5.1.2 ) (3-carboxy- 
muconate lactonizing enzyme) (gene pcaB) [4], an enzyme involved in aromatic acids catabolism 
[0987] Consensus pattern: G-S-x(2)-M-x(2)-K-x-N- 

[ 1] Woods S.A., Shwartzbach S.D., Guest J.R. Biochim. Biophys. Acta 954:14-26(1988). 
[ 2] Woods S.A., Miles J.S., Guest J.R. FEMS Microbiol. Lett. 51:181-186(1988). 
[ 3] Zalkin H., Dixon J.E. Prog. Nucleic Acid Res. Mol. Biol. 42:259-287(1992). 

[4] Williams S.E., Woolridge E.M.. Ransom S.C., Landro J. A., Babbitt PC, Kozarich J W Biochemistry 31 ■ 
9768-9776(1992). 

[0988] 340. MCM family signature and profile 

Proteins shown to be required for the initiation of eukaryotic DNA replication share a highly conserved domain of about 
210 amino-acid residues [1,2,3]. The latter shows some similarities [4) with that of various other families of DNA- 
dependent ATPases. Eukaryotes seem to possess a family of six proteins that contain this domain. They were first 
identified in yeast where most of them have a direct role in the initiation of chromosomal DNA replication by interacting 
directly with autonomously replicating sequences (ARS). They were thus called 'minichromosome maintenance pro- 
teins' with gene symbols prefixed by MCM. These six proteins are: - MCM2. also known as cdc19 (in S.pombe) [El] 
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<PDOC00188> families [3, 1 9] U " O00499 > 30(1 1,16 cytosolc fatty- acid binding proteins 

J 1] Cowan S.W.. Newcomer M.E., Jones T.A. Proteins 8 4^61 (19901 

4 Godovac-Zimmermann J. Trends Biochem. Sci. 13 64-66(1988) 
{ 5J Pervaiz S., Brew K. FASEB J. 1:209-214(1987) 

[ 6] Kremer J.M.H., Witting J., Janssen LH.M. Pharmacol. Rev. 40:1-47(1989) 

2 E2K ^ ^J" 0 ' Je ""° D - Tschopp J. Mo,. ,mmuno.. 28 Si31(l991) 
[ 8] Keen J.N.. Caceres I., Eliopoubs E.E., Zagalskv PF Findlau 1 r r c.7 1 1 - u 
[ 9] Newcomer M.E. Structure 1:7-18(1993). X E " r ' J " B,0Chem - 197 :407-417(1991). 

[10] Collet C, Joseph R. Bicchim. Biophys. Acta 1167:219-222(1993) 

■ISlf 0 ? 1 L ;? U,a ' ,e J - P ' °*>°®x *■ J- BW. Chm. 268:10274., 028HI9S31 
[19] Ftower D.RFEBS Lett. '^^993) ^ »** '80:69-74(1991). 

[0982] Cytosolic tatty-acid binding proteins signature (Iip2) 

tenstrandedantiparallelbeta-barreLalbeit wTha wide S,Sn^2 h 9 t ,r0 T a . COmm0n anC8B,0t ™» «"«*»"» « a 
+ 1 topo.ogy enclosing an internal Hgand S^ 

specific, types 0. fatty acid binding ^oteinT(FAB?5 fou^dh tLr S 7 l0 ^ tolhi ^ ^.ude: - Six. tissue- 
Hear, FABP is aIso Known as mammaryirl^S ep^ermal, ad^ocyte. brain/retina." 
of mammary carcinoma cells. Epidermal FABP i aZ kZ„ !1 Z ° ■ P *" *** f9VerSibly inhibits P'°'»eration 
acid-binding proteins. - Testis lipW biXp^ TbT^I^T^^^ 131 '^^^ 
retinoic acid-binding protein (CRABR GaSrot^rli 2 ! « and II (CRBP). - Cellular 
secret™. ,t seems^t J^SZLZS^ ■«* - 
from the midgut of the insect Manduca sexta Ml In addit^ to 71. T ? ^ pr0,6ins MFB1 "* MFB2 
Myelin P2 protein, which may be a lipid - 
myelin. - Schistosoma mansoni proteh Sm^KJh M " S P2 ' S assoc *™ with the lipid Mayer of 
suum pis a secreted proteh Z n^v obv a SjErr T ° * " 1V0,Ved h *» ,rans P° rt * ' att y - Ascaris 
products or that may be inZrtlTe ™l? e l2l7£ ^ l °™ ,atty acWs ^ ** P"°xS>n 
acid-binding proteins FAOfT^ F^I^^ J ££ZZ r° °' *" e " She "- " fa ^ 
f^e proteins a segment from me N« e ^m^ uS" Caen0mabd «* ^ » * Mature pattern 
[0983, consensus pattern: P*"VKWFY^^ 

[ 1] Bernier I., Jolles P. Biochimie 69 11 27-1 152(1 9B7) 

(1994). - Ul0,er )ean L., Hellman U.. Saurat J.-H. Biochem. J. 302:363-371 

[ 4] Smith A.F., Tsuchida K.. Hanneman E., Suzuki TC Wells M a . Ri„, «. ~ 

I S, Moser D., Tend.er M., Griffiths G., K*,kert M. O J BfoK Chera tJS^S^^ 1 ^ 
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dues]- 

[ 1] McAlister-Henn L Trends Bicchem. Sci. 13:178-181(1988). 

[ 2] Gietl C. Biochim. Biophys. Acta 1100:217-234(1992). 

[ 3] Birktoft J. J., Rhodes G., Banaszak LJ. Bkx:hemistry 28:6065-6081(1989). 

[ 4] Cendrin R, Chroboczek J., Zaccai G., Etsenberg K, Mevarech M. Biochemistry 32:4308-4313(1993). 
[0972] 334. Legume lectins signatures 

Leguminous plants synthesize sugar-binding proteins which are called legume lectins [1 ,2). These lectins are generally 
found in the seeds. The exact function of legume lectins is not known but they may be involved in the attachment of 
nitrogen-fixing bacteria to legumes and in the protection against pathogens. Legume lectins bind calcium and manga- 
nese (or other transition metals). Legume lectins are synthesized as precursor proteins of about 230 to 260 amino acid 
residues. Some legume lectins are proteotytically processed to produce two chains: beta (which corresponds to the 
N-terminal) and alpha (C-terminal).The lectin concanavalin A (con A) from jack bean is exceptional in that the two chains 
are transposed and ligated (by formation of a new peptide bond). The N -terminus of mature conA thus corresponds 
to that of the alpha chain and the C-terminus to the beta chain. Two signature patterns specific to legume lectins have 
been developed: the first is located in the C-terminal section of the beta chain and contains a conserved aspartic acid 
residue important for the binding of calcium and manganese; the second one is located in the N-terminal of the alpha 
chain. 

[0973] Consensus pattern: [LIV]-[STAG]-V-[DEQVHFLI]-D-[ST] [D binds manganese and calcium]- 
Consensus pattern: [LIV]-x-[EDQ]-[FYWKR]-V-x-[LIVF]-G-[LF]-[ST]- 

I 1] Sharon N., Lis H. FASEB J. 4:3198-320(1990). 

[ 2] Lis H., Sharon N. Annu. Rev. Biochem. 55:33-37(1986). 

[0974] 335. CoA-ligases (ligases- CoA) 

[0975] This family includes the CoA ligases Succinyl-CoA synthetase alpha: and beta chains, matate CoA ligase and 

ATP-crtrate lyase. Some members of the family utilise ATP others use GTP 

[0976] [1] Wolodko WT, Fraser ME, James MN, Bridger WA, J Biol Chem 1994;269:10883-10890. 

[0977] 336. linker histone H1 and H5 family 

[0978] Linker histone H1 is an essential component of chromatin structure. H1 links nucleosomes into higher order 
structures Histone H1 is replaced by histone H5 in some cell types. 

[0979] [1 J Ramakrishnan V, Finch JT, Graziano V, Lee PL, Sweet RM, Nature 1993;362:219-223. 
[0980] 337. Lipocalin signature (lipt) 

Proteins which transport small hydrophobic molecules such as steroids, bilins, retinoids, and lipids share limited regions 
of sequence homology and a common tertiary structure architecture [1 to 5]. This is an eight stranded antiparallel beta- 
barrel with a repeated + 1 topology enclosing a internal ligand binding site [1,3]. The name 'lipocalin' has been proposed 
[5] for this protein family. Proteins known to belong to this family are listed below (references are only provided for 
recently determined sequences). - Alpha-1 -microglobulin (protein HC), which seems to bind porphyrin. - AIpha-1 -acid 
glycoprotein (orosomucoid), which can bind a remarkable array of natural and synthetic compounds [6]. - Aphrodisin 
which, in hamsters, functions as an aphrodisiac pheromone. - Apolipoprotein D, which probably binds heme-related 
compounds. - Beta-lactoglobulin, a milk protein whose physiological function appears to bind retinol. - Complement 
component C8 gamma chain, which seems to bind retinol [7]. - Crustacyanin [8], a protein from lobster carapace, which 
binds astaxanthin, a carotenoid. - Epididymal-retinoic acid binding protein (E-RABP) [9] involved in sperm maturation. 
- Insectacyanin, a moth bilin-binding protein, and a related butterfly bilin- binding protein (BBP). - Late Lactation protein 
(LALP), a milk protein from tammar wallaby [10J. - Neutrophil gelatinase-associated lipocalin (NGAL) (p25) (SV-40 
induced 24p3 protein) [11]. - Odorant-binding protein (OBP), which binds odorants. - Plasma retinol-binding proteins 
(PRBP). - Human pregnancy-associated endometrial alpha-2 globulin. - Probasin (PB), a rat prostatic protein. - Pros- 
taglandin D synthase (EC 5.3.99.2 ) (GSH-independent PGD synthetase), a lipocalin with enzymatic activity [12]. - 
Purpurin, a retinal protein which binds retinol and heparin. - Quiescence specific protein p20K from chicken (embryo 
CH21 protein). - Rodent urinary proteins (alpha-2-microglobulin), which may bind pheromones. - VNSP 1 and 2, putative 
pheromone transport proteins from mouse vomeronasal organ [13]. - Von Ebner's gland protein (VEGP) [14] (also 
called tear lipocalin), a mammalian protein which may be involved in taste recognition. - A frog olfactory protein, which 
may transport odorants. - A protein found in the cerebrospinal fluid of the toad Bufo Marin us with a supposed function 
similar to transthyretin in transport across the blood brain barrier [15].- Lizard's epididymal secretory protein IV (LESP 
IV), which could transport small hydrophobic molecules into the epididymal fluid during sperm maturation [1 6). - Prokary- 
otic outer-membrane protein blc [17]. The sequences of most members of the family, the core or kernal lipocalins, are 
characterized by three short conserved stretches of residues [3,18].Others, the outlier lipocalin group, share only one 
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ribC in Escherichia coli. ribB in Bacillus subtilis and pS»?21 ! Sy " thaSe alpha (RS-alpha) (gene 

"boflavhfrom two moles of 

: phoreum lumazine protein (LumP) (gVne lux L ? ZSE^S^'*^^ 

emission of bacterial luciferase. In the presence ^TlumP .E l L ^ °' bi ^™escence 

wavelength). LumP binds non^ovalentJT ™££^^ " '° « W values Shorter 

proteh (YFP) (gene luxY). Like LumP. ^P n^M^^^^Z^; ^ ye "° W **™»** 

covalently to FMN. These proteins seem to f^e evo^e^lrom *a^imr ° Warc ^ a ' on 9 er wavelength. YFP binds non- 

► C ^™alsectk*nthisdoma^^ 

site for Lum.RS-alpha which binds two rnoleTulTrf ,^ 1 ,_ . Pr0f>0Sed t0 b ° *• bindin 9 

one mo.ecu.e of Lum, has a Glu inst^Sg h , h e £ SS?i° P '" "J* ** LumP *** bin * 
which binds to one molecule of FMN, also seerre fohave aatofcl rt^f second copy of the motif. Similarity, YFP, 
for Glu in the test positionof the first copy JlTr^Zt^!^ ^ ^ * 8utettuton <" G1 V 
[0962, Consensus pattern: \UV M ^y G 4^^Z^^ S "* LBml *"** 

[0963J 331. Lysyl oxidase putative copper-binding region signature 

aldehyde cross-links.LOX binds a single copper ^Z^ Je^Z^l T ^ *'* ,hen ab,e to ,0 ™ 

which includes at least three histidine ligands FouSdTne m ^^ Tai ^^^P^ 
This region is thought to be involved in coooer bwt ! h T « ! ^ C ' US,ered h 3 Cen,fal «#" °* »» enzyme 
signature pattern. H-.nu.ng ano ,s caned the 'copper-talon" [1]. This region was used as a 

[0964] Consensus pattern: W-E-W-H-S-C-H-Q-H-Y-H 

[0968] 333. L-lactate dehydrogenase active site (MM) *W1«-4921. 

tetramenc enzyme is present in prokaryoticand eukarvotic o^nt™ , 1 V S,6P " anaerobic glycolysis. This 
theM, om(LDH ^^ 

(LDH-C), found only in the spermatozoa of mammals ano birds Vn wll m " 3rt mUSCle and the X to ™ 

astructura.proteinandiskno^asepsi^ 

£ catalyzes the reversible «, steLpe^^ 1.1.1-)(L-h fc DH) 
boxyte acids. L-hicDH is evolutionary related to LDH's A B ?ItaS^SS^ ,B ^ ^ ^^xy-car- 
a conserved histidine whk* is essente. to the catalytic rr^Sm ^ Se,eCted ^ includes 

[0969, Consensus pattern: [L.VMAJ-G-^-H-G-pNHST] [H is the active site residue] - 

U.S.A. 85:7114-7118(1988). 9 * ' Bloemendal H • * Jong W W. Proc. Natl. Acad. Sci. 

[ 3] Lerch H.-P, Frank R., Collins J. Gene 83:263-270(1989). 

[0970, Malate dehydrogenase active site signature (Idh2) 
prokaryoticorganismsUainsasingTeZo^^^^^^ 

* the mitochondrial matnx and the Ither in mec^olZ F T^T T? 7 ^ ° ne which is ,oca, « 1 

functbns in the gtyoxylate pathway. In planW^S^^ 2? ^ 3 9'W°™' form which 
JLL182) which is essential for both the SS^Sj^ . a " < f* Wl0nal NADP^ependent form of MDH (EC 

As a signature pattern for this m^a^^^SSS!^^ ^T* mow cyc.e. 

anism [3]: an aspartic ackf which is involved h aZ^n^SSSf 1^ " me m <*"- 
[0971, Census pattern: ^FBK^^ 
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(forrecent listings see [1,2,3]): - Major outer membrane lipoprotein (murein-lipoproteins) (gene (pp). - Escherichia coli 
lipoprotein-28 (gene nlpA). - Escherichia coli lipoprotein-34 (gene nlpB). - Escherichia coli lipoprotein nlpC. - Es- 
cherichia coli lipoprotein nlpD. - Escherichia coli osmofcairy inducible lipoprotein B (gene osmB). - Escherichia coli 
osmotically inducible lipoprotein E (gene osmE). - Escherichia coli peptidogtycan-associated lipoprotein (gene pal). - 
Escherichia coli rare lipoproteins A and B (genes rplA and rplB). - Escherichia coli copper homeostasis protein cutF 
(or nlpE). - Escherichia coli ptasmids traT proteins. - Escherichia coli Col plasmids lysis proteins. - A number of Bacillus 
beta-lactamases. - Bacillus subtilis periplasms oligopeptide-binding protein (gene oppA). - Borrelia burgdorferi outer 
surface proteins A and B (genes ospA and ospB). - Borrelia hermsii variable major protein 21 (gene vmp21) and 7 
(gene vmp7). - Chlamydia trachomatis outer membrane protein 3 (gene omp3). - Fibrobacter succinogenes endoglu- 
canase cel-3. - Haemophilus influenzae proteins Pal and Pep. - Klebsiella pullulunase (gene pulA). - Klebsiella pullu- 
lunase secretion protein puis. • Mycoplasma hyorhinis protein p37. - Mycoplasma hyorhinis variant surface antigens 
A, B, and C (genes vIpABC). - Neisseria outer membrane protein H.8. - Pseudomohas aeruginosa lipopeptide (gene 
IppL). - Pseudomonas solaria cea rum endoglucanase egt. - Rhodopseudomonas viridis reaction center cytochrome 
subunit (gene cytC). - Rickettsia 17 Kd antigen. - Shigella flexneri invasion plasmid proteins mxU and mxiM. - Strep- 
tococcus pneumoniae oligopeptide transport protein A (gene amiA). - Treponema pallidium 34 Kd antigen. - Treponema 
pallidium membrane protein A (gene tmpA). - Vibrio harveyi chitobiase (gene chb). - Yersinia virulence plasmid protein 
yscJ. - Halocyanin from Natrobacterium pharaon is [4], a membrane associated copper- binding protein. This is the 
first archaebacterial protein known to be modified in such a f ashion).From the precursor sequences of all these proteins, 
a consensus pattern and a set of rules to identify this type of post-translational modification was derived. 
[0957] Consensus pattern: {DERK}(6)-[LIVMFWSTAG](2)-[LIVMFYSTAGCQHAGSJ.C [C is the lipid attachment 
site] Additional rules: 1) The cysteine must be between positions 15 and 35 of the sequence in consideration. 2) There 
must be at least one Lys or one Arg in the first seven positions of the sequence. 

[ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 
[ 2] Klein P., Somorjai R.L. Lau RC.K. Protein Eng. 2:15-20(1988). 
[ 3] von Heijne G. Protein Eng. 2:531-534(1989). 

[ 4] Mattar S. t Scharf B., Kent S.B.H., Rodewald K., Oesterhett D., Engelhard M. J. Biol. Chem. 269- 14939-1 4945 
(1994). 

[0958] 329. (Lopoprotein 5) Prokaryotic membrane lipoprotein lipid attachment site. In prokaryotes, membrane lipo- 
proteins are synthesized with a precursor signal peptide, which is cleaved by a specific lipoprotein signal peptidase 
(signal peptidase II). The peptidase recognizes a conserved sequence and cuts upstream of a cysteine residue to 
which a glyceride-fatty acid lipid is attached [1].Some of the proteins known to undergo such processing currently 
include (for recent listings see [1,2,3]): - Major outer membrane lipoprotein (murein-lipoproteins) (gene !pp). - Es- 
cherichia coli lipoprotein-28 (gene nlpA). - Escherichia coli lipoprotein-34 (gene nlpB). - Escherichia coli lipoprotein 
nlpC. - Escherichia coli lipoprotein nlpD. - Escherichia coli osmotically inducible lipoprotein B (gene osmB). - Escherichia 
coli osmotically inducible lipoprotein E (gene osmE). - Escherichia coli peptidoglycan-associated lipoprotein (gene pal). 
- Escherichia coli rare lipoproteins A and B (genes rplA and rplB). - Escherichia coli copper homeostasis protein cutF 
(or nlpE). - Escherichia coli plasmids traT proteins. - Escherichia coli Col plasmids rysis proteins. - A number of Bacillus 
beta-lactamases. - Bacillus subtilis periplasms oligopeptide-binding protein (gene oppA). - Borrelia burgdorferi outer 
surface proteins A and B (genes ospA and ospB). - Borrelia hermsii variable major protein 21 (gene vmp21) and 7 
(gene vmp7). - Chlamydia trachomatis outer membrane protein 3 (gene omp3). - Fibrobacter succinogenes endoglu- 
canase cel-3. - Haemophilus influenzae proteins Pal and Pep. - Klebsiella pullulunase (gene pulA). - Klebsiella pullu- 
lunase secretion protein puis. - Mycoplasma hyorhinis protein p37. - Mycoplasma hyorhinis variant surface antigens 
A, B, and C (genes vIp ABC). - Neisseria outer membrane protein H.8. - Pseudomonas aeruginosa lipopeptide (gene 
IppL). - Pseudomonas solanacearum endoglucanase egl. - Rhodopseudomonas viridis reaction center cytochrome 
subunit (gene cytC). - Rickettsia 17 Kd antigen. - Shigella flexneri invasion plasmid proteins mxU and mxiM. - Strep- 
tococcus pneumoniae oligopeptide transport protein A (gene amiA). - Treponema pallidium 34 Kd antigen. - Treponema 
pallidium membrane protein A (gene tmpA). - Vibrio harveyi chitobiase (gene chb). - Yersinia virulence plasmid protein 
yscJ. - Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper- binding protein. This is the first 
archaebacterial protein known to be modified in such a fashion).From the precursor sequences of all these proteins, 
a consensus pattern and a set of rules to identify this type ol post-translational modification have been developed. 
[0959] Consensus pattern: {DERK}(6)-[LIVMFWSTAG](2)-[LIVMFYSTAGCQ]-[AGS]-C [C is the lipid attachment 
site] Additional rules: 1 ) The cysteine must be between positions 15 and 35 of the sequence in consideration. 2) There 
must be at least one Lys or one Arg in the first seven positions of the sequence. 

[0960] [ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22.451 -471 (1990).[ 2] Klein P, Somorjai R.L., Lau RC.K. 
Protein Eng. 2:1 5-20(1 988).[ 3) von Heijne G. Protein Eng. 2:531-534(1989).] 4] Mattar S., Scharf B.. Kent S.B.K, 
Rodewald K., Oesterhelt D., Engelhard M. J. Biol. Chem. 269:14939-14945(1904). 
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[1] Wirtz K.W.A. Annu. Rev. Biochem. 60:73-99(1991). 
[2J Arondel V., Kader J.C. Experientia 46:579-585(1990) 

PJOhlrogge J.B.. Browse J., Somerville C.R. Biochim. Biophys. Acta 1082:1-26(1991). 
f^Smf? *2 Lysosomeassocia «^ membrane glycoproteins signatures 

rysosome-.umina. domains separated ££S EST "T °' h0m0k *° us 

brane region followed by a vwy short ctoteZLw In * ^1 ? , ^ C " ,erm,na ' ex,remi,v *™ * a transmem- 
disulfide bond, This stricture* To^Z^J^t^^ *« are two conserved 



d.sulf.de bond, Th,s structure is schematically represented in the figure below 
+ „ .Hum, 

KXXXTYrY¥YYvrwv»vv»»«- r> 

KXXXXX 



xCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxxxCxxxxxCxxxxxxxxxxxxCxxxxxCxxxj 



-xHingex xTMxO 



lnmammals,therearetwocloselyrelatedtypesof temp- larrm-1 andlama ? i„rh,>i,« . 

macrophage protein CD68 (or macrosia.ini ,2] is a ^9^^^^ '^wnasLEPIOO.The 
consists of a mucin-like domain followed bv a oroline rirh hi„I ■ ? 9 ^brane protein whose structure 
and a short cytoptesmic tail. tJ^^^^^^- * m * : 3 ,ra "^mbrane region 

on the first conserved cysteine d he ^SSSSSJT tT h X "T" 8 ^ deV6l0ped - The first °™ s 

[ 1] Fukuda M. J. Biol. Chem. 266:21327-21330(1991) 

[ 2J Hohess C.L.. da Sifva H.P., Fawcet, J., Gordon 8..' Simmons D.L J. Bio,. Chem. 268:9661-9666(1993). 

!mI2 T TP? enZym6S ' G '°- S - L ' ,an,il * serine act ™ sfte 
[^RecentlyftJ.afamilyofnpo.yt.enzyn.shasbeenc 

: 

- Vibrio mimicus arylesterase. 

- Escherichia coli acyl-coA thtoesterase I (gene tesA) 

- R?rT,, P nH raha K? ly,iCUS the,mo,abile hemolysin/atypica. phospholipase 

320 amino acids P 3ke °* " p,ds - AdRat >- B «=°n«ains four repeats of about 

- Arabidopsis thaliana and Brassic napus anther-specific proline-rich protein APG 

quence motif that can be used as a ^Z^^^T^ " * '° C " d h 3 d - 

- Consensus pattern: IL.VMFYA^^-D-S-fUVMJ-xd.^-fTAGJ-G (S is the active site residue] 

(signalpeptidase .I). The peptidase «4^cS^t2^^ ? 3 SpeCi " C Hpopr ° ,ein si 9 nal P***" 
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division of vulval blast cells. - Vertebrate insulin gene enhancer binding protein isH. lsl-1 binds to one of the two cis- 
acting protein-binding domains of the insulin gene. - Vertebrate homeobox proteins lim-1. lim-2 (lim-5) and Iim3. - 
Vertebrate lmx-1 , which acts as a transcriptional activator by binding to the FLAT element; a beta-cell-specific tran- 
scriptional enhancer found in the insulin gene. - Mammalian LH-2, a transcriptional regulatory protein involved in the 
control of cell differentiation in developing lymphoid and neural cell types. - Drosophiia protein apterous, required for 
the normal development of the wing and halter imaginaJ discs. - Vertebrate protein kinases LIMK-1 and LIMK-2. - 
Mammalian rhombotins. Rhombotin 1 (RBTN1 or TTG-1) and rhombotin-2 (RBTN2 or TTG-2) are proteins of about 
1 60 amino acids whose genes are disrupted by chromosomal translocations in T-cell leukemia. - Mammalian and avian 
cysteine-rich protein (CRP), a 192 amino-acid protein of unknown function. Seems to interact with zyxin. - Mammalian 
cysteine-rich intestinal protein (CRIP), a small protein which seems to have a role in zinc absorption and may function 
as an intracellular zinc transport protein. - Vertebrate paxillin, a cytoskeletal focal adhesion protein. - Mouse testin. 
Mouse testin should not be confused with rat testin which is a thiol protease homolog. - Sunflower pollen specific protein 
SF3. - Chicken zyxin. Zyxin is a low-abundance adhesion plaque protein which has been shown to interact with CRR 
- Yeast protein LRG1 which is involved in sporulation [4]. - Yeast rho-type GTPase activating protein RGA1/DBM1. - 
Caenorhabditis elegans homeobox protein ceh-14. - Caenorhabditis elegans homeobox protein unc-97. - Yeast hypo- 
thetical protein YKR090w. - Caenorhabditis elegans hypothetical proteins C28H8.6.These proteins generally have two 
tandem copies of a domain, called LIM (forLin-11 lsl-1 Mec-3) in their N-terminal section. Zyxin and paxillin areexcep- 
tions in that they contains respectively three and four LIM domains attheir C4erminal extremity. In apterous, isl-1 , LH- 
2, lin-11, lim-1 to lim-3,Imx-1 and ceh-14 and mec-3 there is a homeobox domain some 50 to 95 amino acids after 
theLIM domains.ln the LIM domain, there are seven conserved cysteine residues and ahistidine. The arrangement 
followed by these conserved residues is C-x(2)-C-x(16 > 23)-H-x(2)-[CH]-x(2)-C-x(2)-C-x(16,21)<;-x(2,3)-[CHD]. The 
LIM domainbinds two zinc ions [5J. LIM does not bind DNA, rather it seems to act asintertace for protein-protein inter- 
action. A pattern was developed that spans the first half of the LIM domain. 

[0947] Consensus pattern: C-x(2)-C-x(1 5 t 21 )-[FYWH]-H-x(2)-[CH]-x(2)-C-x(2)-C-x(3)-[LIVMF] (The 5 C's and the H 
bind zinc] 

[ 1] Freyd G., Kim S.K., Horvitz H.R. Nature 344:876-879(1990). 

[ 2] Baltz R., Evrard J.-L., Domon C, Steinmetz A. Plant Cell 4:1465-1466(1992). 

[ 3] Sanchez-Garcia I., Rabbitts T.H. Trends Genet. 10:315-320(1994). 

[ 4] Mueller A., Xu G., Wells R., Hollenberg CP, Piepersberg W. Nucleic Acids Res. 22:3151-3154(1994). 

[5] Michelsen J.W., Schmeichel K.L, Beckerle M.C., Winge D.R. Proc. Natl. Acad Sci U S A. 90 4404-4408 

(1993). 

[0948] 324. (LRR) Leucine Rich Repeat 

CAUTION: This Ram may not find all Leucine Rich Repeats in a protein. Leucine Rich Repeats are short sequence 
motifs present in a number of proteins with diverse functions and cellular locations. These repeats are usually involved 
in protein-protein interactions. Each Leucine Rich Repeat is composed of a beta-alpha unit. These units form elongated 
non-globular structures. Leucine Rich Repeats are often flanked by cysteine rich domains. Number of members: 3017 
[1] The leucine-rich repeat: a versatile binding motif. Kobe B, Deisenhofer J; Trends Biochem Sci 1994;19:415-421. 
[2] Crystal structure of porcine ribonuclease inhibitor, a protein with leucine-rich repeats. Kobe B Deisenhofer J- Nature 
1993;366:751-756. 

[0949] 325. Plant lipid transfer protein family signature (LTP) 

[0950] Plant cells contain proteins, called lipid transfer proteins (LTP) |1 ,2,3J, which are able to facilitate the transfer 
of phospholipids and other lipidsacross membranes. These proteins, whose subcellular location is not yet known, could 
play a major role in membrane biogenesis by conveying phospholipids such as waxes or cutin from their site of bio- 
synthesis to membranes unable to form these lipids. Plant LTP's are proteins of about 9 Kd (90 amino acids) which 
contain eight conserved cysteine residues all involved in disulfide bridges, as shown in the following schematic repre- 
sentation. 



+ + | + — + m|....... 

xCxxxxCxxxxxxCCxxxxxxxxCxCxxxxxxxxxxxCxxxxxxCxx 1 1 1 1 + 
- + 

'C: conserved cysteine involved in a disulfide bond, 
position of the pattern. 
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gatbn of nonexchange chromosomes during meiosis - Human CENP-F ut rcwo c • . ■ . 
kinetochores during chromosome conn re «L r^'=, !T 1 J ' CENP ' E 18 a that associates with 

discarded at the end of ZSSe SSSSf"- ■ pt * i mU »~ 31 anaphaSe - 30(1 fc dilative* 
and/ or spindte elongation. - Huma^ Z^s*^ZlZ?\ TrL^T • "~« 

toward the microtubule's plus end - Yeast KaS or^l ^ <MKLfM) : ? 4 motof P rote,n «*«• activity is directed 
KAR3 may med„e m JLbu.e s^X^^ ™"* 

body duplicate during mitoS SZ£ ^ Eme^e^ ^n^"s T ^ ^ * tor Spind,e 

- Emericella nidulans klpA. - Caenorhabdit iv eToans urc ZS. " ' rnPOrtan, r °' e h nUC,ear division - 

needed for neuronal ce5 differentiaU^en^ xZS^ ? SUbSt3nCeS 

mitosis. - Arabrdopsis thaliana KatA. KatB and katr rhtlZ T • Xenopus E 9 5 ' whlcn ™y be involved in 

thefcal protem T09A5.2.The kinesin motor domain is located in the frminal cS of n 2?S^ ^ hVP °" 
the exception of KAR3. WpA, and ncd where it is locatedin^r^lT mos * ° f «heabove proteins, with 
about 330 aminoacids. M ATP^nd^amoL ^S h sectwn.The kinesin motor domain contains 

invofcedinmicrotubu.e^ 

the microtubule-binding part ^ d ° ma,n BdenVed ,r0m a conse ' ved "^peptide hsjdB 

Consensus pattern: (GSAJ-[KRHPSTQVMHLIVMF]-x-ILIVMFJ-[IVC]-D-L-[AHJ-G-[SANJ-E 

[ 1) Bloom Q.S., Endow S.A. Protein Prof. 21109-1171(1995) 

[ 2] Vallee R.B.. Shpetner H.S. Annu. Rev. Bfochem. 59:909-932(1990) 

[3] Brady S.T. Trends Cell Biol. 5:159-164(1995). 

[4] Endow S.A. Trends Biochem. Sci. 16:221-225(1991).[E1] 

[0942] 321. Ribosomal protein L15 signature 
Se^XTbe^a^ 

Eubacterial L15. - Plan t chloroolas U5 rn^Ti ^ * L ° 638,3 °' Sequence [U groups: - 

thermophib L29. - fSJSSS ^T^SSSS* " * Chae f baC,erial L15 - " Ve « L27a. - Teira^ymena 

!^(3H~3)T" 
-Cp"^ 

•he outer membrane of all Gramme bac^ T^lbS (LPS) ' 3 8 *»* U present in 

and may be responsible for the secreL oSa TNF ^riL ? P ' n,eraCt Wfth ,he CD14 rece P ,or 

BPI binds LPS and has a cytotoxic aSi* cTg™ < BP '>- Like LBp 

CETP is invoked in the transfeXns^LTote^^^ 
protein (PLTP). May play a keyXSTxT^^ 

proteins are structural LSaS £ S T<° ' ""T "* * HDL Thes ° 

regions was selected, which Si in mT^LS ° f**™""*"** *» * signature pattern one of these 
the binding to the lipids^] ^ * * eSe PK>MnS ' 8 re 9 ion could °° Solved in 

Consensus pattern: l p ^I GA h[UVMCJ-x(2)-f^[iy]-[ST]-x(3)-L-x(5)-[EQ]-x(4)-[LIVM]-[EQK]-x(8)-P 

(3ST ' W ' Ra99S G - LCOn9 S GUmina R J - J ' 001 C E - E,sbach P J- BW- Chem. 264:9505-9509 
^7^^ C ^ Gran, F.J., C-Hara P.,. Marcov.a S.M., 

[0946] 32a LIM domain signature and profile 
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which inhibit both cathepsin D (aspartic proteinase) and trypsin. - Alpha-amytase/subtilisin inhibitors from barley and 
wheat. - Albumin-1 (WBA-1 ) from goa bean seeds [3J. - Miraculin from Richadella dulcifica [4], a sweet taste protein. 
- Sporamin from sweet potato (5), the major tuberous root protein. - Thiol proteinase inhibitor PCPI 8.3 (P340) from 
potato tuber [6]. - Wound responsive protein gwin3 from poplar tree (7]. - 21 Kd seed protein from cocoa [8J. All these 
proteins contain from 170 to 200 amino acid residues and one or twointrachain disulfide bonds. The best conserved 
region is found in their N-terminal section and is used as a signature pattern 
[0938] Consensus pattern: [U V7^}-x-D-x-[EDr>rrY]4DG]4RKHDENQ]-x-[UVM]-x(5)-Y-x-[UVM] - 

[ 1] Laskowski M., Kato I. Annu. Rev. Biochem. 49:593-626(1980). 

[ 2] Ritonja A.. Krizaj I., Mesko P., Kopitar M., Lucovnik P., Strukelj B., Pungercar J. ( Buttle D J Barrett A.J Turk 
V. FEBS Lett. 267:13-15(1990). 

[ 3] Kortt A.A., Strike P.M., de Jersey J. Eur. J. Biochem. 181:403-408(1989). 

(4) Theerasilp S.. Hitotsuya H., Nakajo S., Nakaja K., Nakamura Y, Kurihara Y. J. Biol Chem 264 6655-6659 
(1989). 

[ 5} Hattori I, Yoshida N., Nakamura K. Plant Mol. Biol. 13:563-572(1989). 

[ 6] Krizaj I., Drobnic-Kosorok M., Brzin J., Jerala R., Turk V. FEBS Lett. 333:15-20(1993). 

[ 7] Bradshaw H.D.. Hollick J.B., Parsons T.J., Clarke H.R.G., Gordon M.P. Plant Mol. Biol. 14:51-59(1989). 

[ 8) Tai H., McHenry L, Fritz P.J., Furtek D.B. Plant Mol. Biol. 16:913-915(1991). 

[0939] 319. Beta-ketoacyl synthases active site 

Beta-ketoacyl-ACP synthase (KAS) [1] is the enzyme that catalyzes the condensation of malonyl-ACP with the growing 
fatty acid chain. It is found as a component of the following enzymatic systems: - Fatty acid synthetase (FAS), which 
catalyzes the formation of long-chain fatty acids from acetyl-CoA, malonyl-CoA and NADPH. Bacterial and plant chlo- 
roplast FAS are composed of eight separate subunits which correspond to different enzymatic activities; beta-ketoacyl 
synthase is one of these polypeptides. Fungal FAS consists of two multifunctional proteins, FAS1 and FAS2; the beta- 
ketoacyl synthase domain is located in the C-terminal section of FAS2. Vertebrate FAS consists of a single multifunc- 
tional chain; the beta-ketoacyl synthase domain is located in the N-terminal section [2]. - The multifunctional 6-meth- 
ysalicylic acid synthase (MSAS) from Penicillium patulum [3]. This is a multifunctional enzyme involved in the biosyn- 
thesis of a polyketide antibiotic and which has a KAS domain in its N-terminal section. - Polyketide antibiotic synthase 
enzyme systems. Polyketides are secondary metabolites produced by microorganisms and plants from simple fatty 
acids. KAS is one of the components involved in the biosynthesis of the Streptomyces polyketide antibiotics granatacin 
[4], tetracenomycin C [5] and erythromycin. - Emericella nidulans multifunctional protein Wa. Wa is involved in the 
biosynthesis of conidial green pigment. Wa is protein of 216 Kd that contains a KAS domain. - Rhizobium nodulation 
protein nodE, which probably acts as a beta-ketoacyl synthase in the synthesis of the nodulation Nod factor fatty acyl 
chain. - Yeast mitochondrial protein CEM1 . The condensation reaction is a two step process: the acyl component of 
an activated acyl primer is transferred to a cysteine residue of the enzyme and is then condensed with an activated 
malonyl donor with the concomitant release of carbon dioxide. The sequence around the active site cysteine is well 
conserved and can be used as a signature pattern. 

[0940] Consensus pattern: G-x(4)-[LIVMFAP]-x(2)-[AGC]-C-[STA](2)-[STAG]-x(3)-[LIVMF] [C is the active site resi- 
due] 



[ 1 J Kauppinen S., Siggaard-Andersen M., von Wettstein-Knowles P. Carlsberg Res. Commun. 53:357-370(1988). 

[ 2] Witkowski A., Rangan VS., Randhawa Z.I., Amy CM., Smith S. Eur. J. Biochem. 198:571-579(1991). 

[ 3] Beck J., Ripka S., Siegner A., Schiftz E., Schweizer E. Eur. J. Biochem. 192:487-498(1990). 

[ 4] Bibb M.J., Biro S., Motamedi H., Collins J.F, Hutchinson C.R. EMBO J. 8:2727-2736(1989). 

[ 5} Sherman D.H., Malpartida F, Bibb M.J., Kieser H.M., Bibb M.J., Hopwood D.A. EMBO J. 8:2717-2725(1989). 

[0941] 320. Kinesin motor domain signature and profile 

Kinesin [1,2,3] is a microtubule-associated force-producing protein that mayplay a role in organelle transport. Kinesin 
is an oligomeric complex composedof two heavy chains and two light chains. The kinesin motor activity isdirected 
toward the microtubule's plus end. The heavy chain is composed of three structural domains: a large globular N-terminal 
domain which is responsible for the motor activity of kinesin (it isknown to hydrolyze ATP, to bind and move on micro- 
tubules), a central alpha-helical coiled coil domain that mediates the heavy chain dimerization; and asmall globular C- 
terminal domain which interacts with other proteins (such asthe kinesin light chains), vesicles and membranous or- 
ganelles.A number of proteins have been recently found that contain a domain similarto that of the kinesin 'motor* 
domain (1,4,E1J: - Drosophila claret segregations protein (ncd). Ned is required for normal chromosomal segregation 
in meiosis. in females, and in early mitotic divisions of the embryo. The ncd motor activity is directed toward the mi- 
crotubule's minus end. - Drosophila kinesin-like protein (nod). Nod is required for the distributive chromosome segre- 
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[ 1] Neuwald AlF., York J.D., Majerus PW. FEBS Lett. 294:16-18(1991) 

[ 2] Gteeser H-U, Thomas D., Gaxiola R.. Montrichard F., Surdin-Kerjan Y, Serrano R. EMBO J. 12:3105-3110 
[ 3] Bone R., Springer J.R. Atack J.R. Proc. Natl. Acad. Sci. U.S.A. 89:10031-10035(1992). 
[0924] 31 3. Ion transport protein 

cte.rchi. Ci he aalvay o . lw£SSS2' L . ' NADP«tep«M„„ » dote™. In E s - 

^1^^;^"^ v ■ l4 — a ° E •»■• ^ »»■ »* *d 

[ 2) Cupp J.R, McAlister-Henn L J. Biol. Chem. 266 221 99-22205(1 991) 

[ 3] Imada K.. Sato M. Tanaka N., Katsube Y, Matsuura Y, Oshima T. J Mol. Biol 222 725-738(19911 
[ 4] Zhang T, Koshland D.E. Jr. Protein Sci. 4:84-92(1995) 738(1991). 
[ 5] Tipton P. A. Beecher B.S. Arch. Biochem. Biophys. 313:15-21(1994). 

[0928] 315. Jacalin-like lectin domain 

iSa"' SWM «»W»- 1 K- Ban.,,,. R, Sna,™ V . s»* A. V**» M: N„ S» M Bio, ,996:3: 

[0931] 316. KH domain 

[0932] KH motifs probably bind RNA directly Auto antibodies to Nova * kru rinm a in ♦ ■ 

opsoclonus ataxia. dnuooaies to Nova, a KH domain protein, cause paraneoplastic 

[1] Burd CG, Dreyfuss G, Science 1 994 ;265*6 15-621 

[2] Musco G, Stier G, Joseph C. CastigHone More.li MA, Nilges M, Gibson TJ. Pastore A. Ce.l 1996;85:237-245. 
[0933] 317. Kelch motif 

[0934] The kelch motif was initially discovered in Kelch ^wi«-nn^w\ i« *u- 

motif. It has been shown that Swiss-Q046S2fe^ S PTOtem there are six co P ies of the 

MO pnJLJ. and a^aS p™.,^ a. S= S '™T T."™" '™' h »'W^«*»> ««»».*. 
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- Klebsiella pneumoniae protein pqqF. This protein is required for the biosynthesis of the coenzyme pyrrolo-quino- 
line-quinone (PQQ). It is thought to be protease that cleaves peptide bonds in a small peptide (gene pqqA) thus 
providing the glutamate and tyrosine residues necessary for the synthesis of PCX). 

Yeast protein AXL1 , which is involved in axial budding [3]. 
Eimeria bovis sporozoite developmental protein. 

- Escherichia coli hypothetical protein yddC and HI1 368, the corresponding Haemophilus influenzae protein. 
Bacillus subtilis hypothetical protein ymxG. 

Caenorhabditis elegans hypothetical proteins C28F5.4 and F56D2. 1 . 

[0914] It should be noted that in addition to the above enzymes, this family also includes the core proteins I and II 
of the mitochondrial bd complex (also called cytochrome c reductase or complex III), but the situation as to the activity 
or lack of activity of these subunits is quite complex: 

In mammals and yeast, core proteins I and II lack enzymatic activity. 

In Neurospora crassa and in potato core protein I is equivalent to the beta subunit of MPP. 

In Euglena gracilis, core protein I seems to be active, while subunit II is inactive. 

[0915] These proteins do not share many regions of sequence similarity; the most noticeable is in the N-terminal 
section. This region includes a conserved histidine followed, two residues later by a glutamate and another histidine. 
In pitrilysin, it has been shown [4] that this H-x-x-E-H motif is involved in enzyme activity; the two histidines bind zinc 
and the glutamate is necessary for catalytic activity. Non active members of this family have lost from one to three of 
these active site residues. We developed a signature pattern that detect active members of this family as well as some 
inactive members. 

[0916] Consensus pattern G-x(8,9)-G-x.[STA]-H^LIVMFY]-[UVMC]-[DERN]-[HRKL]-ILIvtFAT]-x-(LFSTH]-x- 
[GSTAN]-[GST] [Th© two H are zinc ligands] [E is the active site residue] Sequences known to belong to this class 
detected by the pattern ALL active members as well as all MPP alpha subunits and core II subunits. Does not detect 
inactive core I subunits. 

[0917] Note: these proteins belong to family M16 in the classification of peptidases [5]. 

[ 1] Rawlings N.D., Barrett A.J. Biochem. J. 275:389-391(1991). 

[2] Braun H.-R, Schmitz UK. Trends Biochem. Sci. 20:171-175(1995). 

[ 3] Becker A.B., Roth R.A. Proc. Natl. Acad. Sci. U.S.A. 89:3835-3839(1992). 

[ 4] Fujita A., Oka C, Arikawa Y, Katagai T, Tonouchi A., Kuhara S., Misumi Y Nature 372:567-570(1994). 
[ 5] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1 995). 

[0918] 310. Involucrin repeat 

[0919] Eckert RL, Yaffe MB, Crish JF, Murthy S, Rorke EA, Welter JF, J Invest Dermatol 1993;100:613-617. 
[0920] 31 1 . Isochorismatase family. This family are hydrolase enzymes. 

[0921] Romao MJ, Turk D, Gomis-Ruth FX, Huber R, Schumacher G, Mollering H, Russmann L J Mol Biol 1992" 
226:1111-1130. 

[0922] 312. Inositol monophosphatase family signatures (inosrtol_P) 

It has been shown [1] that several proteins share two sequence motifs. Two of these proteins are enzymes of the 
inositol phosphate second messenger signaling pathway: - Vertebrate and plants inositol monophosphatase (EC 
31 - 3 25 ) ' Vertebrate inositol polyphosphate 1 -phosphatase (EC 3.1.3.57 ).The function of the other proteins is not 
yet clear: - Bacterial protein cysQ. CysQ could help to control the pool of PAPS (3'-phosphoadenoside S'-phosphosul- 
fate), or be useful in sulfite synthesis. - Escherichia coli protein suhB. Mutations in suhB results in the enhanced syn- 
thesis of heat shock sigma factor (htpR). - Neurospora crassa protein Qa-X. Probably involved in quinate metabolism. 

- Emericella nidulans protein qutG. Probably involved in quinate metabolism. - Yeast protein HAL2/MET22 [2] involved 
in salt tolerance as well as methionine biosynthesis. - Yeast hypothetical hypothetical protein YHR046c. - Caenorhab- 
ditis elegans hypothetical protein F13G3.5. - A Rhizobium leguminosarum hypothetical protein encoded upstream of 
the pss gene for exopolysaccharide synthesis. - Methanococcus jannaschii hypothetical protein MJ0109.lt is suggested 
[1 J that these proteins may act by enhancing the synthesis or degradation of phosphorylated messenger molecules. 
From the X-ray structure of human inositol monophosphatase |3], it seems that some of the conserved residues are 
involved in binding a metal ion and/or the phosphate group of the substrate. 

[0923] Consensus pattern: [FWV]-x(0 t 1)-[LIVM]-D-P-[LIVM]-D-[SG]-[ST]-x(2)-[FY]-x- 
[HKRNSTY] [The first D and the T bind a metal ion]- 

Consensus pattern: [WV]-D-x-[ACMGSAHGSAPV]-x-[UVACPHLIV]-[LIVAC]-x(3)-[GH]-[GA]- 



EP 1 033 405 A2 



GMP into IMP [3]. It converts nucleobase, nucleoside and nucleotide derivatives of G to A nucleotides and maintain. 
sMartw o rflh* 0< Aand G nucleotides - 'MP dehydrogenase and GMP reductase store rnan^re^iorwo^ sequence 
^SrasiSre^ 

[0906] Consensus pattern: fUVMHRKHUVMJ-G-ILIVI^-G-x-G-S^IVMJ^.x.T (C is the putative IMP-binding resi- 
1 1] Collart F.R., Huberman E. J. Biol. Chem. 263:15769-15772(1988) 

1 2] Natsumeda Y., Ohno S.. Kawasaki H., Konno Y., Weber G., Suzuki K. J. Biol. Chem. 265 5292-5295(1990) 
[ 3] Andrews S.C., Guest J.R. Biochem. J. 255:35-43(1988). ^a5(1990). 

[09071 306. (IPPc) Inositol polyphosphate phosphatase family, catalytic domain 
[0908J [1] York JD, Ponder JW, Chen ZW, Mathews FS. Majerus PW 

SSSS JeBer r ^ AUemaVeWat V ' ** DA ' LT " PW = J Bio. Chem 

' t3J 2hang * Jettef son AB. Auethavekiat V. Maierus PW Proc Natl Acad <*ri 1 1 q a iooc qo. 

FEBS Lett 1991;294:16-18. 

[0909] 307. IQ calrnodulin-binding motif 

£ 2 X HarriSOn ° H ' ^ hHchtin 9 Sweet ™* Ka »abokis VN, Szent-Gyorgyi AG, Cohen C; Nature 1994;368: 
[2] Rhoads AR, Friedberg F; FASEB J 1997;11:331-340. 

Kin ^ ,nosi # ne - uridine Preferring nucleoside hydrolasefamily signature (IU nuc hydro) 

[091 1] Consensus pattern: D-x-D-[PT]-[GA]-x-D-D-[TAV]-[VI]-A - 

[ p! SL a nl M N n M Tn m ' o 69an ° M - J C " ^nramm V.L Biochemistry 35:5963-5970(1996) 

[ 2] Degano M., Gopaul D.N., Scapin G., Schramm V.L, Sacchettini j.c. Biochem^ 

[0912] 309. (Insulinase) 

Insulinase family, zinc-binding region signature 

(aka Peptidase_M16) 

' ^ f °^^ ase ^ 3.4.24.55) (piifilysin) (gen. plf), a periplasms PTKymd Ih&degftMJessiriall psp- 
^LZ^h , m ^ rt ;'™" Cy, ° pl8S "' m *» "*•*«««,, *~ msrnbfarhB. II » IZ « 
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( 2) Atomi H., Ueda M., Hikida M., Hishida T., Teranishi Y., Tanaka A. J. Biochem. 107:262-266(1990). 
[0890] 299. Initiation factor 2 subunit 

[0891] This family includes initiation factor 2B alpha, beta and delta subunits from eukaryotes, related proteins from 
archaebacteria and IF-2 from prokaryotes. Initiation factor 2 binds to Met-tRNA, GTP and the small ribosomal subunit. 
[0892] [1] Kyrpides NC, Woese CR, Proc Natl Acad Sci U S A 1998;95:3726-3730. 
[0893] 300. Initiation factor 3 signature 

Initiation factor 3 (IF-3) (gene infC) [1] is one of the three factors required for the initiation of protein biosynthesis in 
bacteria. IF-3 is thought to function as a fidelity factor during the assembly of the ternary initiation complex which consist 
of the 30S ribosomal subunit, the initiator tRIMA and the messenger RNA. IF-3 binds to the 30S ribosomal subunit; it 
is a basic protein of 141 to 212 residues. The chloroplast initiation factor IF-3(chl) is a protein that enhances the poly 
(A,U,G)-dependent binding of the initiator tRNA to chtoroplast ribosomal30s subunits. In its mature form it is a protein 
of about 400 residues whose central section is evolutionary related to the sequence of bacterial IF-3 [2]. As a signature 
pattern a highly conserved region was selected located in the central section of bacterial IF-3 and of IF-3(chl) 
[0894] Consensus pattern: lKR]^LIVM](2)-[DNHFY]-[GSN]-[KR]-[LIVMFYS]-x-[FY]-[DEQTH]-x(2HKRQ]- 

[ 1] Liveris D., Schwartz J.J., Geertman R„ Schwartz I. FEMS Microbiol. Lett 112:211-216(1 993). 
[ 2] Lin Q., Ma L, Burkhart W.. Spremulli LL J. Biol. Chem. 269:9436-9444(1994). 

[0895] 301 . Imidazoleglycerol-phosphate dehydratase signatures (IGPD) 

Imidazoleglycerol-phosphate dehydratase (EC 4.2.1.19 ) is the enzyme that catalyzes the seventh step in the biosyn- 
thesis of histidine in bacteria, fungi and plants. In most organisms it is a monofunctional protein of about 22 to29 Kd. 
In some bacteria such as Escherichia coli it is the C-terminal domain of a bH unctional protein that include a histidinol- 
phosphatase domain [1 ]. Two signature patterns were developed that each include two consecutive histidine residues. 
[0896] Consensus pattern: [UVMY]-[DE]-x-H-H-x(2)-E-x(2)-[GCA]-[LIVMHSTACHLIVMh 
Consensus pattern: G-x-[DN]-x-H-H-x(2)-E-[STAGC]-x-[FY]-K - 

[0897] [ 1 J Carlomagno M.S., Chiariotti L, Alifano P., Nappo A.G., Bruni C.B. J. Mol. Biol. 203:585-606(1988). 
[0898] 302. lndole-3-glycerol phosphate synthase signature (IGPS) 

lndole-3-glycerol phosphate synthase (EC 4.1.1.48 ) (IGPS) catalyzes the fourth step in the biosynthesis of tryptophan: 
the ring closure of 1 -(2-carboxy-phenylamino)-1 -deoxyributose into indol-3-glycerol-phosphate.ln some bacteria, IGPS 
is a single chain enzyme. In others - such as Escherichia coli - it is the N-4erminal domain of a Afunctional enzyme 
that also catalyzes N-(5*-phosphoribosyl)anthraniIate isomerase (PRAI) activity, the third step of tryptophan biosynthe- 
sis. In fungi, IGPS is the central domain of a trifunctionai enzyme that also contains a PRAI C-terminal domain and a 
glutamine amidotransferase N-terminal domain. The N-terminal section of IGPS contains a highly conserved region 
which X-ray crystallography studies [1) have shown to be part of the active site cavity. This region was used as a 
signature pattern for IGPS. 

[0899] Consensus pattern: [UVMFY]-{LIVMC]-x-E-[LIVMFYCl-K-[KRSP]-ISTAKl-S-P-[ST]-x(3)-[LIVMFYST)- 
[0900] [ 1] Wilmanns M., Priestle J.P., Niermann T, Jansonius J.N. J. Mol. Biol. 223:477-507(1 992) 
[0901] 303. (IL2) Interleukin 2. 31 members 

[0902] 304. (ILVD EDD) Dihydroxy-acid and 6-phosphogluconate dehydratases. Two dehydratases have been 
shown [1] to be evolutionary related: - Dihydroxy-acid dehydratase (EC 4.2.1.9 ) (gene irvD or ILV3) which catalyzes 
the fourth step in the biosynthesis of isoleucine and valine, the dehydratation of 2,3-dihydroxy-isovaleic acid into alpha- 
ketoisovaleric acid. - 6-phosphogluconate dehydratase (EC 4.2.1.12 ) (gene edd) which catalyzes the first step in the 
Entner-Doudoroff pathway, the dehydratation of 6-phospho-D-gluconate into 6-phospho-2-dehydro-3-deoxy-D-gluco- 
nate. - Escherichia coli hypothetical protein yjhG. Both enzymes are proteins of about 600 amino acid residues. Two 
highly conserved regions have been developed as signature patterns. The first pattern is located in the N-terminal part 
and contains a cysteine that could be involved in the binding of a 2Fe-2S iron-sulfur cluster [2]. The second pattern is 
located in the C-terminal half. 

[0903] Consensus pattern: C-D-K-x(2)-P-[GA]-x(3)-[GA] (The C could be a 2Fe-2S ligand] 
Consensus pattern: [SA]-L-[LIVM]-T-D-[GA)-R-[LIVMF]-S-[GA]-[GAV]-[ST]- 

[0904] [ 1] Egan S.E., Fliege R., Tong S., Shibata A., Wolf R.E. Jr., Conway T. J. Bacteriol. 174:4638-4646(1992). 
[ 2] Velasco J.A., Cansado J., Pena M.C., Kawakami T, Laborda J., Notario V. Gene 137:179-185(1993). 
[0905] 305. IMP dehydrogenase / GMP reductase signature 

IMP dehydrogenase (EC 1.1.1.205) (IMPDH) catalyzes the rate-limiting reaction of de novo GTP biosynthesis, the 
N AD-dependent reduction of IMP into XMP [ 1 j.lnhibition of IMP dehydrogenase activity results in the cessation of DNA 
synthesis. As IMP dehydrogenase is associated with cell proliferation, it is a possible target for cancer chemotherapy. 
Mammalian and bacterial IMPDHs are tetramers of identical chains. There are two IMP dehydrogenase isozymes in 
humans [2J.GMP reductase (EC 1.6.6.8 ) catalyzes the irreversible and NADPH-dependent reductive deamination of 
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xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxHHHHHHHHtttHHHHHHHHHxxxxxxxxxx 1 1 1 1 1 1 

[ 3] Gehring W.J. Trends Biochem. Sci. 17-277-280(1992) 

I 4j Gehring W.J.. Hiromi Y. Anna Rev. Genet. 20:147-173(1986) 

[ 5] SchofieH P.N. Trends Neurosci. 10:3-6(1987). 

[0883] ■Homeobox* antennapedia-type protein signature (home2) 

rr 9e a n™^^ 

eight DrosophilaLcgenes^ 

amino acids upstream of the homectoxdcm^The^D^ 

<Antp).abdominal-A(abd-A),defo™^ 

are collectively known as the •aSS 

HOX-A2. A3. M.A5, A6, aJ.TSSTS ?ri rrru W ^ te9eneSareta l^ 
DaCaenorharxMiseleganslin-^^^ «• <*■ C8. Hox-D1.D3. D4 and 

<™amiVofh^ 

[0884] Consensus pattern: [UVMFEHFY]-P-W-M-[KRQTA]- 

[ 1] McGinnis W., Krumlaul R. Cell 68:2B3-302M9QP ^ 
[ 2J Scott M.P. Cell 71:55i-5sa( ioo9) " 

!JS '" omeobox ' engrailed-type protein signature (home3) 
subfamily are: - DrosophilasegmeLionpTS^^ 

and is required for the development of the cenkaTn™? , ( } ^ *° ***** se 9mentation pattern 

protehs engrailed and invectS ^ S« e ?;; "T?* ^ < iw > " Silk «* 

and E60. - Grasshopper (Schistocerca aTe^ZSn .^S3^2f?T " *" * *"* " E30 
2 and -3. - Sea urchin (Tripneusteas qratilla) SU Hftwi i a E "" 1 and En ' 2 ' Zebrafis h Eng-1,- 

'°h-l6.Engrai.edhJe<LpS^^ 

residues located at the C-terminal of fhe t^Z^H 1 * 61 80016 20 amin <**W 

stretch of eight perfect* conse^s^T^^^ «°' "* - a 

[0887] Consensus pattern: L-M-A-[EQ]-G-L-Y-N- 

! 2 rph tt M w ?o J W " HartZel ' G W 1,1 Bbchim ' Blo P h ^ Acta 989 25-48(1989) 
[ 2] Gehring WJ. Science 236:1245-1252(1987). «liSB9). 

[0888J 298. Isocitrate lyase signature (ICL) 
^tet^ 

A cysteine, a histidine and a g.ulnate T aS ^T a T b VlT *** *** * f a " d i"*- 

OnVonecysteine residue is Lserved bel^^^^^^^^^ 

in the middle of a conserved Uexapept^Z^Z^^Z ^ P 13 "' and ba ^ria. enzymes; it istocated 
[088* C^sensuspattem^^ 

[ 1J Beeching J.R Protein Seq. Data Anal. 2:463-466(1989). 
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[0871] 296. Histone H2A signature (hisl ) 

Histone H2A is one of the four histories, along with H2B, H3 and H4, which forms the eukaryotic nucleosome core. 
Using alignments of histone H2Asequences [1,2,E1] as a signature pattern, a conserved region in the N-terminal part 
of H2A. This region is conserved both in classical S-phase regulated H2A's and in variant histone H2A's which are 
synthesized throughout the cell cycle. 
[0872] Consensus pattern: (AC)-G-L-x-F-P-V- 

[ 1) Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

[ 2] Thatcher T.H., Gorovsky MA Nucleic Acids Res. 22:174-179(1994). 

[0873] Histone H4 signature (his2) 

[0874] Histone H4 is one of the four histones, along with H2A, H2B and H3, which forms the eukaryotic nucleosome 
core. Along with H3, it plays a central role in nucleosome formation. The sequence of histone H4 has remained almost 
invariant in more then 2 billion years of evolution (1 ,E1J. The region used as a signature pattern is a pentapeptide found 
in positions 14 to 18 of all H4sequences. It contains a lysine residue which is often acetylated [2] and a histidine residue 
which is implicated in DNA-binding [3]. 
[0875] Consensus pattern: G-A-K-R-H- 

[ 1 ] Thatcher T.H. , Gorovsky M.A. Nucleic Acids Res. 22: 1 74-1 79(1 994). 

[2] Doenecke D., Gallwrtz D. Mol. Cell. Biochem. 44:113-128(1982). 

[ 3] Ebralidse K.K., Grachev S.A., Mirzabekov A.D. Nature 331:365-367(1988). 

[0876] Histone H3 signatures (his3) 

Histone H3 is one of the four histones, along with H2A, H2B and H4, which forms the eukaryotic nucleosome core. It 
is a highly conserved protein of 135 amino acid residues [1,2^11]. The following proteins have been found to contain 
a C-terminal H3-like domain: - Mammalian centromeric protein CENP-A [3]. Could act as a core histone necessary for 
the assembly of centromeres. - Yeast chromatin-associated protein CSE4 [4]. - Caenorhabditis elegans chromosome 
111 encodes two highly related proteins (F54C8.2 and F58A4.3) whose C-terminal section is evolutionary related to the 
last 100 residues of H3. The function of these proteins is not yet known. Two signature patterns were developed, The 
first one corresponds to a perfectly conserved heptapeptide in the N-terminal part of H3. The second one is derived 
from a conserved region in the central section of H3. 
[0877] Consensus pattern: K-A-P-R-K-Q-L- 
Consensus pattern: P-F-x-[RA]-L-[VA]-[KRQl-[DEG]-[IVJ- 

[ 1] Wells D.E., Brown D. Nucleic Acids Res. '19:2 173-2 188(1 991). 

[ 2] Thatcher T.H., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

[ 3] Sullivan K.F, Hechenberger M., Masri K. J. Cell Biol. 127:581-592(1994). 

[ 4] Stoler S., Keith K.C., Curnick K.E., Fitzgerald-Hayes M. Genes Dev. 9:573-586(1995). 

[0878] Histone H2B signature (his4) 

[0879] Histone H2B is one of the four histones, along with H2A, H3 and H4, which forms the eukaryotic nucleosome 
core. Using alignments of histone H2Bsequences [1.2.E1], a conserved region was selected in the C-terminal part 
ofH2B. 

[0880] Consensus pattern: [KR]-E-[LIVM]-[EQ]-T-x(2)-[KR]-x-{LIVM](2)-x-[PAG]-[DE].L- x-[KR]-H-A-[LIVMHSTA]- 
E-G- 

[ 1) Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

I 2] Thatcher T.H., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

[0881] 297. , Homeobox , domain signature and profile (homel ) 

The •homeobox* is a protein domain of 60 amino acids [1 to 5.EVJ first identified in a number of Drosophila homeotic 
and segmentation proteins. It has since been found to be extremely well conserved in many other animals, including 
vertebrates. This domain binds DNA through a helix-turn-helix type of structure. Some of the proteins which contain a 
homeobox domain play an important role in development. Most of these proteins are known to be sequence specific 
DNA-binding transcription factors. The homeobox domain has also been found to be very similar to a region of the 
yeast mating type proteins. These are sequence-specific DNA-binding proteins that act as master switches in yeast 
differentiation by controlling gene expression in a cell type-specific fashion. A schematic representation of the home- 
obox domain is shown below. The helix-turn-helix region is shown by,the symbols K (for helix), and V (for turn). 
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heme-binding domain of cytochrome b5 family 

[0865] Consensus pattern: [F^LIVMKI.x(2)-H.p.[GA]-G [H is a heme axial tigancfl- 

[1]Ozols J. Biochim. Biophys. Acta 997:121-130(1989) 
[2] Guiard B. EMBO J. 4:3265-3272(1 985) 

[6] Levin R.J.. Boychuk PL. Croniger CM., Kazzaz J.A.. Rozek C.E. Nuc.ec Acids Res. 17:6349*367(1989). 

[0866] 294. Hexapeptide-repeatcontaining-transferases signature 
On the bass of sequence simaarity. a numto^ 

These pfoteins are: - Serine acetyltransferase (EC 2 3 1 vn ^ItT f propo ® f* 1 1 1 2 . 3 -4] to belong to a single family, 
synthesis. - Azotobacter chrooc^uTnZen SSSin ^p ( ^> eCySE) - inVOlVed in ^ 

zyme involved in the biosynthesis of lactose - udp mT^ui dtei y nrans,erase (EC23.1.18) (gene lacA), an en- 

an enzyme involved in the bkiymhesis ot %^A a ^ 9 ^T? & ac ^ ans,erase (EC 2.3.1.129 ) (gene IpxA). 
the outer membrane of th cT ZltX-^o^^ ^ ** *° "P^^hande to 

>PXD or firA). whtoh is also t^^ZTS ^ ans,era ~ <«> 2.3.1,) (gene 

2.3.1.28) from Agrobacterium tumefaciens flcE «E P c ' ^'^Phemcol acetyttransferase (CAT) (EC 
i^osa. Staphylococci a Z S^tZ£2£25 T T ^ P *°™*>™»* 
(see <PDOC00093 >V - Rhizobium nodulm tor ^oro^n^ evolutionary related to the main family of CAT 

of tuiS^Lu maftoTe otceS^^^ 

transferase (EC 2.3.1.117) (gene daoDl SETS^L * 7^P' ' tetrahydrodiptoolinate N-succinyl- 

lysine from asp^^^L^ ^Z^^,^ ^ " *' bk>s ^^ * diaminophnelate and 
teenegtmUorgcaDortms^ 

ecus aureus protein capG which is involved in biosvnth^?^ , " pop °7 saccnande biosynthesis. - Staphykxo- 
protein YJL218W. which is highly sfnTrt EsS SESf LlTJ^ P ^ k " *- t 

[ 1] Downie J. A. MoL Microbiol. 3:1649-1651(1989) 

[2] Parent R., Roy PH. J. Bacteriol. 174:2891-2897(1992) 

[ 3] Vaara M. FEMS Microbiol. Lett. 97 249-254(1 992) 

! S ZTohT^ l\™T en M " M ' FEBS Lett * ^:2B9-292(1994). 

[ $\ Maetz C.R.H, Roderick S.L Science 270:997-1000(1 995) 

[0868] 295. Hexokinases signature Hexokinass (Fro 7 1 i \ n oi ~ 

the phosphorylation of keto- and aldohex^Te a duS^in ' m ****** enzyme that catalyzes 

donor. In vertebrates there are fo^SS^fS^JTT ^ 35 *» 

whfch i. often incorrectly designaLTcoS^ ^ ^ "' *"* * Type IV hexoki " a ^ 

important role in modeling insu.in sLtoTs Yv^<Z7££? T ^ beta - Ce " S and P 1 ^ 5 « 

I to III, which have low Km values for^ ucoTe rl™? f ' °' ab0Ut 50 Kd He *°"inases °» types 

very smal. N-termina, hS^SiSS^^S^~ ? abOU | 100 Kd " «"ey .on* tf a 
The first domahhas lost tts^atalytT^ 
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ROK1 , a yeast protein. • stel3, a fission yeast protein. - vasa, a Drosophila protein important for oocyte formation and 
specification of embryonic posterior structures. - Me31B, a Drosophila maternally expressed protein of unknown func- 
tion. - dbpA, an Escherichia coli putative RNA helicase. - deaD, an Escherichia coli putative RNA helicase wfiich can 
suppress a mutation in the rpsB gene for ribosomal protein S2. - rhlB, an Escherichia coli putative RNA helicase. - 
rhIE, an Escherichia coli putative RNA helicase. - srmB, an Escherichia coli protein that shows RNA-dependent ATPase 
activity, tt probably interacts with 23S ribosomal RNA. - Caenorhabditis elegans hypothetical proteins T26G10.1, 
ZK512.2 and ZK686.2. - Yeast hypothetical protein YHR065c. - Yeast hypothetical protein YHR169w. - Fission yeast 
hypothetical protein SpAC31 A2.07c. - Bacillus subtilis hypothetical protein yxiN. All these proteins share a number of 
conserved sequence motifs. Some of them are specific to this family while others are shared by other ATP-binding 
proteins or by proteins belonging to the helicases 'superfamir/ [4,E1]. One of these motifs, called the 'D-E-A-D-box\ 
represents a special version of the B motif of ATP-binding proteins. Some other proteins belong to a subfamily which 
have His instead of the second Asp and are thus said to be 'D-E-A-H-box* proteins [3,5,6.E1]. Proteins currently known 
to belong to this subfamily are: - PRP2, PRP16, PRP22 and PRP43. These yeast proteins are all involved in various 
ATP-requiring steps of the pre-mRNA splicing process. - Fission yeast prh 1 , which my be involved in pre-mRNA splicing. 

- Male-less (mle), a Drosophila protein required in males, for dosage compensation of X chromosome linked genes. - 
RAD3 from yeast. RAD3 is a DNA helicase involved in excision repair of DNA damaged by U V light, bulky adducts or 
cross-linking agents. Fission yeast rad15 (rhp3) and mammalian DNA excision repair protein XPD (ERCC-2) are the 
homoiogs of RAD3. - Yeast CHL1 (or CTF1), which is important for chromosome transmission and normal cell cycle 
progression in G(2)/M. - Yeast TPS1. - Yeast hypothetical protein YKL078w. - Caenorhabditis elegans hypothetical 
proteins C06E 1 . 1 0 and K03H 1 .2. - Poxviruses* early transcription factor 70 Kd subunit which acts with RNA polymerase 
to initiate transcription from early gene promoters. - 18, a putative vaccinia virus helicase. - hrpA, an Escherichia coli 
putative RNA helicase. Signature patterns were deve toped for both subfamilies. 

[0861] Consensus pattern: [UVMF](2)-D-E-A-D-[RKEN)-x-[LIVMFYGSTN]- 
Consensus pattern: [GSAH]-x-[LIVMF](3)-D-E-[ALIV]-H-[NECR] - 

Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop) (see the relevant 
entry <PDOC00017 

[ 1] Schmid S.R., Under P. Mol. Microbiol. 6:283-292(1992). 

[ 2] Linder P., Lasko P, Ashburner M., Leroy P., Nielsen P.J., Nishi K., Schnier J., Slonimski PR Nature 337121-122 
(1989). 

[ 3] Wassarman D.A., Steitz J. A. Nature 349:463-464(1991). 

[ 4] Hodgman T.C. Nature 333:22-23(1 988) and Nature 333:578-578(1 988) (Errata). 

[ 5] Harosh I., Deschavanne P. Nucleic Acids Res. 19:6331-6331(1991). 

[ 6] Koonin E.V., Senkevich T.G. J. Gen. Virol. 73:989-993(1992). 

[0862] 293. Heme -binding domain in cytochrome b5 and oxidoreductases (heme_1 ) 

[0863] Cytochrome b5 is a membrane-bound hemo protein which acts as an electron carrier for several membrane- 
bound oxygenases [1]. There are two homologous forms of b5, one found in microsomes and one found in the outer 
membrane of mitochondria. Two conserved histidine residues serve as axial iigands for the heme group. The structure 
of a number of oxidoreductases consists of the juxtaposition of a heme-binding domain homologous to that of b5 and 
either a flavodehydrogenase or a motybdopterin domain. These enzymes are: 

- Lactate dehydrogenase (EC 1.1.2.3 ) [2], an enzyme that consists of a flavodehydrogenase domain and a heme- 
binding domain called cytochrome b2. 

- Nitrate reductase (EC 1.6.6.1), a key enzyme involved in the first step of nitrate assimilation in plants, fungi and 
bacteria [3,4]. Consists of a molybdopterin domain (see <PDOC00484 >>, a heme-binding domain called cyto- 
chrome b557, as well as a cytochrome reductase domain. 

- Sulfite oxidase (EC 1.8.3.1 ) [5], which catalyzes the terminal reaction in the oxidative degradation of sulfur<on- 
taining amino acids. Also consists of a molybdopterin domain and a heme-binding domain. 

This family of proteins also includes: 

TU-36B, a Drosophila muscle protein of unknown function [6]. 
Fission yeast hypothetical protein SpAClFl2.10c. 
Yeast hypothetical protein YMR073c. 

- Yeast hypothetical protein YMR272c. 



[0864] A segment was used which includes the first of the two histidine heme Iigands, as a signature pattern lor the 
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[0848] 287. Histidine biosynthesis protein 



' -~^/.iu»cok> yivium 

to other proteins [1 J. Y m ° Vai 01 the acetyl 9 rou *>- H,stone deacetylases are related 

[0853J Leipe DD, Landsman D, Nucleic Acids Res 1997 25' 3693-3697 
[0854] 289. Histidinol dehydrogenase signature 

correspond to the part of the enzyme th a » «n -nost h it ^ *r u-T selected. This region does not 

il 1 ^f3709^r E > ^ ' ' ^ S - Cten9 J - Y - J- *~ Na... Acad. Sci. USA 88: 

[ 2J Grubmeyer C.T., Gray W.R. Biochemistry 25:4778-4784(1986). 

[0856J 290. Homoserine dehydrogenase signature 

XTSe^^ 

participates in the biosynthesis of threonine an! th«£ iiiSJ « f 9 T ,(> homos8 ™- The latter 

as a single chain proton as in some c^teL^d Z L-fn Z * S °' me,hio "™- HDh is found either 
partokinase domain and a cTerrnT. ESSS astn baLT ^r^**™ ° f a " N *>™*' *s- 

f«t»em. the best conserved region o, H^ 

centra, section and that contains J^L ^T^ " * °' 83 ,0 " ,0Ca,ed th ° 

[0857J Consensus pattern: A-x(3)-G-[LIVMFYJ-[STAG]-x(2,3)-[DNSJ-P-x(2)-D-[LIVM]-x-G- x-D-x(3,-K- 

! 2 C^R ri ^ ~ ' SUrdin - Kef i an Y - p EBS Lett. 323:289-293(1 993). 
[ 2] Cam. B., Clepet C, Patte J.-C. Biochimie 75:487-495(1993). 

[0858J 291 . haloacid dehalogenase-like hydrolase 

16 M 96 01 Swiss Pg^a Ths M M m , M^™„Z^? »»«v«l »s»n oUhosNgnimw, b«»e«, 

AnSJ? °f"> bo» temfc ATP*pento« Mca M Mgra,,,,., „, eloaM „ 

this family are: - Initiation factor bIfSI ^^T \ unwinding. Proteins currently known to belong to 

complex invoked I^^^IX^S^ * * « " hi9h ^ 

-PRP5andPRP28.These y eas? P ro^ 

- PI10. a mouse protein ex'press'ed speci,"^^^ 

closely related to PH0. - SPP81/DED1 and DBPi two . " " Xeno f 3US P utatl ™ RNA helfcase, 

related to P.10. - Caenorhabdi.is etegans he' case o,hT MSsiTs h P ™" NA Splicin 9 a " d 

- SPB4, a yeas, protein mvolved in L maSnt, 25S rScl, P.NA "TT"* * 

ATPase and DNA-helicase activities in vitro It is invotL £ ' ^ ' ™ n " UC,ear anti9ea P 68 nas 

Putative RNA hefcase reteted to *B.-t^lZZ^,Z£Z^^ " »?* ^ 3 DrOS ° Phte 
protein involved in ribosome assembly - MAK5 a ventral ™ k ^' ' ' ^ ' DRS1 ■ a V east 

°ry. mako, a yeast protein involved in ma.ntenance of dsRNA killer plasmid. - 
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section of these proteins; the two others on conserved regions located in the central part of the sequence 
[0838] Consensus pattern: (IV]-D-L-G-T-[ST}-x-[SC) - 
Consensus pattern: [UVM^LIVMFY>[DNHLIVM^ 

Consensus pattern: [LIVMY]-x4LIVMFJ-x^^-x4ST>x4LIVMJ T P-x-[LIVM]-x-[DEQKRSTAJ- 

[ 1) Lindquist S., Craig EA Annu. Rev. Genet. 22:631-677(1988). 
[ 2] Pelham H.R.B. Cell 46:959-961(1986). 

[ 3] Pelham H.R.B. Nature 332:776-77(1 988).[ 4] Craig EA BbEssays 11:48-52(1989). 

[ 5] Agranovsky AA, Boyko VP., Karasev A.V, Koonin E.V, Dolja VV J. Mol. Biol. 217:603-610(1991). 

[ 6] Gupta R.S., Singh B. J. Bacterbl. 174:4594-4605(1 992). 

[ 7] Deshaies R.J., Koch B.D., Schekmam R. Trends Biochem. Sci. 13:384-388(1988). 
( 8) Craig EA, Gross CA Trends Biochem. Sci. 16:135-140(1991). 

[0839] 283. Heat shock hsp90 proteins family signature 

Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by the induction of the 
synthesis of proteins collectively known as heat-shock proteins (hsp) [1]. Amongst them is a family of proteins, with 
an average molecular weight of 90 Kd, known as the hsp90proteins. Proteins known to belong to this family are: - 
Escherichia coli and other bacteria heat shock protein c62.5 (gene htpG). - Vertebrate hsp 90-alpha (hsp 86) and hsp 
90-beta (hsp 84). - Drosophila hsp 82 (hsp 83). - Trypanosoma cruzi hsp 85. - Plants Hsp82 or Hsp83. - Yeast and 
other fungi HSC82, and HSP82. - The endoplasmic reticulum protein 'endoplasmic (also known as Erp99 in mouse, 
GRP94 in hamster, and hsp 1 08 in chicken).The exact function of hsp90 proteins is not yet known. In higher eukaryotes, 
hsp90 has been found associated with steroid hormone receptors, with tyrosine kinase oncogene products of several 
retroviruses, with elF2alpha kinase, and with actin and tubulin. Hsp90 are probable chaperonins that possess ATPase 
activity [2,3].As a signature pattern for the hsp90 family of proteins, a highly conserved region found in the N-terminal 
part of these proteins was selected. 

[0840] Consensus pattern: Y-x-[NQH]-K-[DE]-[IVA]-F-L-R-[EDl - 

[ 1) Lindquist S., Craig EA Annu. Rev. Genet. 22:631-677(1988). 

[ 2] Nadeau K„ Das A., Walsh C.T. J. Biol. Chem. 268:1479-1487(1993). 

[ 3] Jakob U., Buchner J. Trends Biochem. Sci. 19:205-211(1994). 

[0841] 284. Helix-tum-helix (HTH3) 

[0842] This large family ot DNA binding helix-turn helix proteins includes Cro Swiss: P03036 and CI Swiss: P03034. 
[0843] 285. Heme oxygenase signature : : 

Heme oxygenase (EC 1.14.99.3) (HO) [1 ] is the microsomal enzyme that, in animals, carries out the oxidation of heme, 
it cleaves the heme ring at the alpha methene bridge to form biliverdin and carbon monoxide. Biliverdin is subsequently 
converted to bilirubin by biliverdin reductase. In mammals there are three isozymes of heme oxygenase: HO-1 to HO- 
3. The first two isozymes differ in their tissue expression and their inducibility: HO-1 is highly inducible by its substrate 
heme and by various non-heme substances, while HO-2 is non-inducible. It has been suggested [2] that HO-2 could 
be implicated in the production of carbon monoxide in the brain where it is said to act as a neurotransmitter! n the 
genome of the chloroplast of red algae as well as in cyanobacteria. there is a heme oxygenase (gene pbsA) that is the 
key enzyme in the synthesis of the chromophoric part of the photosynthetic antennae [3J. An heme oxygenase is also 
present in the bacteria Corynebacterium diphtheriae (gene hmuO), where it is involved in the acquisition of iron from 
the host heme [4].There is, in the central section of these enzymes, a well conserved region centered on a histidine 
residue which is proposed to play a key role in binding the substrate heme at the active center of the enzyme. This 
region was used as a signature pattern. 

[0844] Consensus pattern: MIV]-A-H-[STACH]-Y-[STvHRT]-Y-[LIVM]-G [H binds the heme] 

[ 1] Maines M.D. FASEB J. 2:2557-2568(1988). 
[ 2) Barinaga M. Science 259:309-309(1993). 

[ 3) Richaud C, Zabulon G. Proc. Natl. Acad. Sci. U.S.A. 94:11736-11741(1997). 
[ 4] Schmitt M.P. J. Bacterid 179:838-845(1997). 

[0845] 286. Hepatitis core antigen. 

[0846] The core antigen of hepatitis viruses possesses a carboxyl terminus rich in arginine. On this basis it was 
predicted that the core antigen would bind DNA[1]. There is some experimental evidence to support this [2] 
[0847] [1] Pasek M. Goto T, Gilbert W, Zink B, Schaller H, Mckay P, Leadbetter G, Murray K; Nature 1979;2S2: 
575-579. (2] Gallina A, Bonelli F, Zentilin L, Rindi G, Muttini M, Milanesi G; J Virol 1989;63:4645-4652. 
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[0831] 280. HSF-type DNA-binding domain signature 



assembly and regulation of the oen« ^T«ZT^.' ! FL1, a pro,ein in cell surface 

SKM7 (or BRYiTrS) b^dst h» ^ T" ^ a ^' e ^ l 4 J- - Yeast transcription factor 

expression [5] . ^Z^^^SSS^ , To ^ M ° B ,OC *• * ° 1 ' 

HSF DNA-binding dongas dSffllSS^r' ^ ^ ^ ^ ^ m ~ -~ ^ - »• 
[0833] Consensus pattern: M W K-H^^^ 

[ 1] Sorger P.K. CeH65:363-366figqi) 

[ 2J Mager W.H.. Moradas Ferreira P. Biochem. J. 2901-13(1993) 

[ 5] Morgan B.A.. Bouquin N., Merrill G.F.. Johnston L.H. EMBO J. uSrl^tly ' ' 
[0834] 281. Heat shock hsp20 proteins family profile 
^ P Ss^^^^ 

mo.ecu.ar weigh, of i KdZS StS p £ Sffiff J'SE ST ? 3 T* * Pr ° ,einS " ^ 

aggregates; their family is currentVcomix>sea^^ 

induced byavariety of environmen^^^^ 

and BC. - Caenolbditis e.e^Tm^ 

crassa and Aspergillus Nidulans) - Plant small hsn-I p L« hT,f ■ ( 9 yeaS,) ^ ^P 30 (Neurospora 
cytoplasmic, Cass 1.1 which is chi«^fc^iV3S£ E^S" t 8 " 8 ^ hSP2 ° : C ' aSSeS ' and " «** ar ° 
B chains. Alpha-crystaHin is an abundam^en ^^^r 6 ^^^ -^^^^ 
appears to be to maintain the correct r^^J^JTitT, , „ " S mah ,unc,ion 

as a chaperone [6]. - SchistcWrr^T mS J? n '° Und ° ,her ,iSSUes where rt see ™ »° «* 

domains. - A variety of pZ^pSSSTiSSndl? S" 9 "^ Stn>Cti » an * P 40 fe °« two tandem hs P 20 
cum. spore protein SP 2 ? (SSS^S ^ T Escherichia coli - hs P 1 8 Tom Clostridium acetobutyli- 

acerized by the presence of a ccTe^C ,e™S hyP ^ e,, " al P ro,ein MJ0285. Structurally, this famiy is char- 
ters of the h^O ^ISTS^^S ^ ^ ^ '° *« 
0835] -Sequences known to belong to this class detected by the profile- ALL 

(1M6M4I J aenickea.CreiS T | C 1^^^ * n " W ^ J T Mo1 " «*»*B 
205-211(1994)1 6] Groenen P.J.T.A , Merck KB de!tono vi w ^ 1 ^ , J ^ uchner J - Trends Biochem. Sci. 19: 
[0836] 282. Heat shock hsp70 proteins E£ ^CEl ' * ^ ^'^^ 

SsyS^ 

can be found in different cellular compaJmer^ (IS ^^ ^Tl'T'^ ^P^ins 
of the hs P 70 family proteinsare listed beta* r in f^Ih ™t°chondnal, endoplasmic reticulum, etc.). Some 

as the dnaK proteia A second h^^L - ? °^ "f* ** n " h hSf>70 P '° ,ein " " 

genome of red algae. - In yeast at least toJh^^T- ? dlsco vered. dnaK is also found in the chloroplast 

SSD1 (KAR2), SSE1 (MS^S^JSS^i^.? 6- * SSA1 SSB1 ' SSB2 ' ^ 

HSP68. and HSC-1 to HSC-6 >nJ~ZL- Drosoph,la - tnere are at ■«* eight different hsp70 croteins- HSP70 
GRP78 (also known aVth^mniunc^ HSPA1 to^SPAe. HSC^and 

a hs P 70 homolog has been show^fSl toex^Tnr^T ? ^ " ,he ^ beel * B,tow v " us < S BYV). 
belonging to the ns P 70 fami.y b^ATP A variety SEEZZ? ^T*™ " ^ ™ M ^ 



toy 
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of structural genes for phosphorus acquisition. - Fission yeast protein esc! which is involved in the sexual differentiation 
process. The schematic representation of the helbc-ioop-helbc domain is shown here: 

xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx Amphipathic helix 1 Loop Amphipathic helix 2. 

The signature pattern developed to detect this domain spans completely the second amphipathic helix. 

[0818] Consensus pattern: [DENSTAP]4KTRh[LIVMAGSNT]-{FYWCPHKR}-[LIVT^-[LIVMJ- x(2)-rSTAV14LIVM- 

STACKRJ-x-[VMFYH]-[LIVMTA]-{P}-{P}.[UVMRKHQJ.' 

[ 1) Murre C, McCaw PS., Baltimore D. Cell 56:777-783(1989). 
[ 2] Garrel J., Campuzano S. BioEssays 13:493-498(1991). 
[ 3] Kato G.J., Dang C.V. FASEB J. 6:3065-3072(1992). 

[ 4) Krause M., Fire A., Harrison S.W., Priess J., Weintraub H. Cell 63:907-919(1990). 
[ 5J Riechmann V., van Cruechten I., Sablitzky F. Nucleic Acids Res. 22:749-755(1994). 

[0819] 276. HMG 14 and HMG17 signature 

High mobility group (HMG) proteins are a family of relatively low molecular weight nonhistone components in chromatin. 
HMG1 4 and HMG17 [1], two related proteins of about 100 amino acid residues, bind to the inner side of the nucleosomal 
DNA thus altering the interaction between the DNA and the histone octamer. These two proteins may be involved in 
the process which maintains transcribable genes in a unique chromatin conformation. The trout nonhistone chromo- 
somal protein H6 (histone T) also belongs to this family. As a signature pattern a conserved stretch of 10 residues 
located in the N-terminal section of HMG 14 and HMG17 was selected. 
[0820] Consensus pattern: R-R-S-A-R-L-S-A-[RK]-P- 

[0821] [ 1] Bustin M., Reeves R. Prog. Nucleic Acid Res. Mot. Biol. 54:35-100(1996). 
[0822] 277. Hydroxymethylglutaryl-coenzyme A lyase active site (HMGL1 ) * 

3-hydroxy-3-methylglutaryl-coenzyme A lyase (HMG-CoA lyase or HL) (EC 4.1.3.4 )catalyzes the transformation of 
HMG-CoA into acetyl-CoA and acetoacetate. In vertebrates it is a mitochondrial enyme which is involved in ketogenesis 
and in leucine catabolism [1 J. In some bacteria, such as Pseudomonas mevalonii, it is involved in mevalonate catab- 
olism (gene mvaB). A cysteine has been shown[2), in mvaB, to be required for the activity of the enzyme. The region 
around this residue is perfectly conserved and is used as a signature pattern. 
[0823] Consensus pattern: S-V-A-G-L-G-G-C-P-Y [C is the active site residue]- 

[ 1] Mitchell G.A., Robert M.-R, Hruz P.W., Wang S M Fontaine G., Behnke C.E., Mende-Mueller L.M., Schappert 
K., Lee C, Gibson K.M., Miziorko H.M. J. Biol. Chem. 268:4376-4381(1993). 
[ 2] Hruz P.W., Narasimhan C, Miziorko H.M. Biochemistry 31:6842-6847(1992). 

[0824] Alpha-isopropylmalate and homocitrate synthases signatures (HMGL2) 

The following enzymes have been shown [ 1 ] to be functionally as well as evolutionary related: - Alpha-isopropylmalate 
synthase (EC 4.1.3.12 ) which catalyzes the first step in the biosynthesis of leucine, the condensation of acetyl-CoA 
and alpha- ketoisovalerate to form 2-isopropylmalate synthase. - Homocitrate synthase (EC 4.1.3.21 ) (gene nifV) which 
is involved in the biosynthesis of the iron-molybdenum cofactor of nitrogenase and catalyzes the condensation of 
acetyl-CoA and alpha-ketoglutarate into homocitrate. - Soybean late nodulin 56. - Methanococcus jannaschii hypo- 
thetical proteins MJ0503, MJ1195 and MJ1392. Two conserved regions were selected as signature patterns for these 
enzymes. The first region is located in the N-terminal section while the second region is located in the central section 
and contains two conserved histidine residues which could be implicated in the catalytic mechanism. 
[0825] Consensus pattern: L-R-[DE]-G-x-Q-x(10)-K- 
Consensus pattern: [UVMFW]-x(2)-H-x-H-[DN)-D-x-G-x-[GAS]-x-[GASLI]- 
[0826] [ 1] Wang S.-Z. Dean D.R., Chen J.-S., Johnson J.L J. Bacteriol. 173:3041-3046(1991). 
[0827] 278. (HMG COA synt) Hydroxymethylglutaryl-coenzyme A synthase active site Hydroxymethylglutaryl-coen- 
zyme A synthase (EC 4.1.3.5 ) (HMG-CoA synthase) catalyzes the condensation of acetyl-CoA with acetoacetyl-CoA 
to produce HMG- CoA and CoA [1 ).in vertebrates there are two isozymes located in different subcellular compartments: 
a cytosolic form which is the starting point of the mevalonate pathway which leads to cholesterol and other sterolic and 
isoprenoid compounds and a mitochondrial form responsible for ketone body biosynthesis. HMG-CoA is also found in 
other eukaryotes such as insect, plants and fungi. A cysteine is known to act as the catalytic nucleophile in the first 
step of the reaction, the acetylation of the enzyme by acetyl-CoA. The conserved region was used around this active 
site residue as a signature pattern. 

[0828] Consensus pattern: N-x-[DN]-[IV]-E-G-[IV]-D-x(2)-N-A-C-[FY]-x-G (C is the active site residue)- 

[0829] [ 1] Rokosz L.L, Boulton D.A., Butkiewicz E.A., Sanyal G., Cueto M.A., Lachance P.A., Hermes J.D. Arch. 

Biochem. Biophys. 312:1-13(1994). 

[0830] 279. HMG (high mobility group) box 
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Huebner K. Biochemistry 35:11529-11535(1996) 

aiSSS^: GarriS ° n P ' GilmOUr D - Rin9e °- PetSk ° GA> LOW8nS,9in J M " Nat arurt. OoL 4: 

[081 7] 275. Myc-type, 'helix-loop-helix' dimerization domain signature (HLH) 

A number of eukaryotc proteins, which probably are sequence specific DNA-binding proteins that act as transcrbtion 
factors, shareacc^serveddor^inof 40 toSOam^oac^ 

of two ^amph.pathchel.cesioinedbyavariable.ength linker region that couWfcZ^TUiSS^^ 
2? M Z P T m dimeri2ation ™* has ««" *** the proteins listed betowlbjM M^otSese pr" 

So t^r of ? out : 5 amino acid residues ,nat iS adjaC6nt to •» h£h*J^53S 

b nds to DNA. They are refered as bas.c helix-loop-helix proteins (bHLH), and are classified in two groups eta. A 
(ub«,u ous)andc.ass B^ssue-specffic). Me^ 

a so referred to as the E-box motif. The homo- or heterodimerization mediated by the HLH dc^T^pe^en. 
but necessary for DNA Ending, as two basic regions are required for DNA binding activity. The^LH P rS Sn« 

ShlTSr? "I' 38 ne9atiVe r69U,at0rS ShCe the y *"» "«e^. but fail to uSSS Se 
ha.ry-retated proteins (ha.ry, E(spl). deadpan) also repress transcription although they can bind DNA Th! . Drotlsof 
this sub amjy act together with co-repressor proteins, like groucho. through their C-termina. moM WRFW 2 m£ 
tam.ly of cellular oncogenes [4]. which » currentJy known to contain four members: c+nyc [E3J, N-myc TLc^id £ 

Z«l .St** 77 are , T l ° ^ 3 r ° ,e h Ce " U,ar diffe ^°n and proliferaSn. ^oteinsTnv^ in myt 
genes* (the .nduction of muscle cells). In mammals MyoDI (MyM), myogenin (MyM) Myf- 5 27Jm7<Z?JZ 
here*, ,n birds CMD1 (QMF-1 ), in Xenopus MyoD and MF25. ^CaenorrSditis ile^s cX TarZo^Ja 

StSr complex with myc or mad. - Vertebrate mJSSJpSS 

!>vhr^ (ARNT), single-minded homologs (SIM1 and SIM2), hypoxia-inducible factor 1 alp^re (HIF1 A) aIh rt&eltor 
1^ ^^P 88 ,*™" P"**» (NPAS1 and NPAS2), nk^^damto^i^v^^ 
and human BMAL1. In drosophite, singled (SIM), AH receptor nuclear translator (ARNT^ tracr^eL plotefri 

pendent transition .n collaboration with E47. - Mammalian stem cell protein (SCL) (also known TsW a ™2n 
wh«h may pte y an rnportan, role in hemopoiet* differentiation. SCL is invoked, by chrom^HrSILitoT 
*«na h ^ 6m ki " ^ amma ^ an P T & e>r <s Idl to Id4 [5J. Id (inhibitor of DNA binding) proteins la^k atesk^DNA^ i^ino 
StTlclr, f r T he,er ° dime ' S ^ other H LH proteins, thereby inhiSlg biding to ml ofotlib 
SSC^T T (emC) Pr0te ' n, P artici P a,es " ^nsory organ patterning by antagonizing Jie narrate 
actrvrty of the achaete- scute complex. Emc is the homolog of mammalian Id proteins - Human Sterol Rec3™ 

. r a, repressor, - Drosophifc «j£j p^ 5^£SS ~iZZ^t^"j^ 

« » involved m chromosomal segregation. It binds to a highly conserved DNA sequence found in cL,!™ * 1 

nteracts wrth the upstream actrvatmg sequence of several acid phosphatase genes. - Yeast serine-rich proteh TyS 
that .srequ.red for mediated ADH2expression.- N eurosporacrass,nuc-,,a pr oteint 
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[0806] 270. Glyceraldehyde 3-phosphate dehydrogenase active site (gpdh) 

Glyceraldehyde 3-phosphate dehydrogenase (EC 1.2.1.12) (GAPDH) [1 ] is a tetrameric NAD-binding enzyme common 
to both the glycolytic and gluconeogenic pathways. A cysteine in the middle of the molecule is involved in forming a 
covalent phosphoglycerol thioester intermediate. The sequence around this cysteine is totally conserved in eubacterial 
and eukaryotic GAPDHs and is also present, albeit in a variant form, in the otherwise highly divergent archaebacterial 
GAPDH [2J. Escherichia coli D-erythrose 4-phosphate dehydrogenase (E4PDH) (gene epd orgapB) is an enzyme highly 
related to GAPDH [3]. 

[0807] Consensus pattern: [ASv>S-C-[NT]-T-x(2)-[LIM] [C is the active site residue]- 
[ 1J Harris J.I., Waters M. (In) The Enzymes (3rd edition) 13:1-50(1976). 

[ 2] Fabry S., Lang J., Niermann T., Vmgron M., Hensel R. Eur. J. Biochem. 179:405-413(1989). 
[3] Zhao G., Pease A.J., Bharani N., Winkler M.E. J. Bacteriol. 177:2804-2812(1995). 

[0808] 271 . Granuiins signature 

Granulins [1 ] are a family of cysteine-rich peptides of about 6 Kd which may have multiple biological activity. A precursor 
protein (known as acrogranin) potentially encodes seven different forms of granulin (grnA to grmG) which are probably 
released by post-translational proteolytic processing. A schematic representation of the structure of a granulin is shown 
below: 

xxxCxxxxxCxxxxxCCxxxxxx^ «*-.**~~-. G .. conserved cy sXe ^ e pro bably 

involved in a disulfide bond/*': position of the pattern. Granulins are evolutionary related to a PMP-D1, a peptide 
extracted from thepars intercerebraiis of migratory locusts [2], 

[0809] Consensus pattern: C-x-D-x(2)-H-C-C-P-x(4)-C [The four C's are probably involved in disulfide bonds]- 

[ 1) Bhandari V., Palfree R.G., Bateman A. Proc. Natl. Acad. Sci. U.S.A. 89:1715-1719(1992). 
[ 2] Nakakura N., Hietter H., van Dorsselaer A M Luu B. Eur J. Biochem. 204:147-153(1992). 

[081 0] 272. (HC V RdRp) Hepatitis C virus RN A dependent RN A polymerase 

[0811] The RNA dependent RNA polymerase is also known as non-structural protein NS5B. NS5B is a 65 kDa protein 
that resembles other viral RNA polymerases. HCV replication is thought to occur in membrane bound replication com- 
plexes. These complexes transcribe the positive strand and the resulting minus strand is used as a template for the 
synthesis of genomic RNA. There are two viral proteins involved in the reaction, NS3 and NS5B.[1,2] 
[0812] [1] Lohmann V. Korner F, Herian U, Bartenschlager R; 
J Virol 1997:71:8416-8428. [2] Behrens SE, Tomei L, De Francesco R; 
EMBO J 1996;15:12-22. [3] ishido S. Fujita T, Hotta H; 
Biochem Biophys Res Commun 1998;244:35-40. 
[0813] 273. (HHH) Heiix-hairpin-helix motif. 

[0814] [1J Doherty AJ, Serpell LC, Ponting CP; Nucleic Acids Res 1996;24:2488-2497. 
[0815] 274. HIT family signature 

Recently a family of small proteins of about 12 to 16 Kd has been describee?!]. This family currently consists of: - 
Mammalian protein HINT (also known as Protein kinase C inhibitor 1 or PKCI- 1). HINT was incorrectly thought to be 
a specific inhibitor of PKC. It has been shown to bind zinc. - Fission yeast diadenosine 5\5--P1,P4-tetraphosphate 
asymmetrical hydrolase (Ap4Aase) (EC 3.6.1.17) [2] (gene aphl), which cleaves A-5'-PPPP- 5'A to yield AMP and ATP 
- FHIT, a human protein whose gene is altered in different tumors and which acts [3J as a diadenosine S'.S^-Pl ^tri- 
phosphate hydrolase (Ap3Aase) (EC 3.6.1.29 ) cleaving A-5'-PPP-5'A to yield AMP and ADR - Yeast proteins HNT1 
and HNT2. - Maize zinc-binding protein 2BP14 - Escherichia coli hypothetical protein ycfF. - Haemophilus influenzae 
hypothetical protein HI0961. - Helicobacter pylori hypothetical protein HP0404. - Methanococcus jannaschii hypothet- 
ical protein MJ0866. - Mycobacterium leprae hypothetical protein U296A. - Synechocystis strain PCC 6803 hypothetical 
protein slr1 234. - Caenorhabditis elegans hypothetical protein F21C3.3. - A hypothetical 1 3.2 Kd protein in hisE 3'region 
in Azospirillum brasilense. - A hypothetical 13.1 Kd protein in p37 S'regbn in Mycoplasma hyorhinis. - A hypothetical 
12.4 Kd protein in psbAII 5'region in Synechococcus strain PCC 7942. All these proteins contains a region with three 
clustered histidines. This region is responsible for the designation of this family: HIT, for 'HIstidineTriad [1]. This region 
was originally thought to be implied in the binding of a zinc ion but was later identified [4] as part of the alpha-phosphate 
binding site of a nucleotide-binding domain. As a signature pattern, the region of the histidine triad was selected 
[0816] Consensus pattern: [NQA)-x(4)-[GAV]-x-[QF]-x-[LIVM]-x-H-[LIVMFYT]-H-[LIVMFTl-H-[LIVMF](2HPSGA]^ 

[ 1) Seraphin B. DNA Seq. 3:177-179(1992). 

[ 2] Huang Y., Garrison RN., Barnes L.D. Biochem. J. 312:925-932(1995). 

[ 3] Barnes L.D., Garrison P.N., Siprashvili 2., Guranowski A., Robinson A.K., Ingram S.W., Croce CM., Ohta M., 



EP 1 033 405 A2 

One of to conserved regions in these enzymes is centered on a conserved aspartic acid residue which h» h*™ 

Ssite^r Pattem: t™*****^ [D b the 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Castle LA., Smith K.D., Morris R.O. J. Bacteriol. 174:1478-1486(1992). 
[ 3] Bause E., Legler G. Biochim. Biophys. Acta 626:459-465(1 980). 

[0803] 268. Glycosyl hydrolases family 8 signature 

The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases f EC 
M±4), ce..ob,ohydrolases (EC MJJl)(exog.ucanases). or xylanases (EC 3.2 1 8) [1 21 FuSStfSZJ S 

£2? 2?i! ™ ^ ,hese fami,ies is to™ as me cel,ubs8 <™» D pj « as 

erKlc<iucleasecnrK^-BacillusstrainKSM-330acidfcendcflucleasaKrFnHr^ r= ooacier xyimum 

C = sus pattern: MST]-D- [ A Gl -D. (2 H.M > A-x- [ SAHL.V M HL.VM Gh x-A- xSh^SL, D is an ac,„e s*e 
[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990) 

3 Hennssa B Claeyssens M., Tomme P., Lemesle L, Mornon J.-'P Gene 81:83-95(1989) J15(1 " U 
[ 4] Hennssat B. Biochem. J. 280:309-316(1 991 ). 
[ 5] Alzari P.M., Souchon H. t Dominguez a Structure 4:265-275(1996). 

[0804] 269. Glycosyl hydrolases family 9 active sites signatures 

jest ,am "" s r * ss^ssi's; 

^viceose i) (celZ). - Cbstndium thermocellum endoglucanases D (celD) F (celR and I r»in rikr^ ♦ 
succmoqenes endoolucanatA A /onHA\ d™..^ „ i** 01 ^. r tceir-j ana I (cell). - Fibrobacter 

[0805] Consensus pattern: [STV]-x-[LIVMFY]-|STVI-x(2)-G-x-[NKRl-xl4^ IPI I vmi u v orZi- . 
Consensus pattern: Wx- D -x<4 HFY ^ 

[ 1] Beguin P. Annu. Rev Microbiol. 44:21 9-248M 9^0) 

! 3! HenTs^' rT" 8831 D ' Q " Mi "*' * C " Jr " Warren R A J - Micra "io.. Rev. 55:303-315(1991) 

3 Hennssa B Claeyssens M.. Tomme P., Lemesle L. Momon J,R Gene 81:63-95(1989) ' 
1 4] Hennssat B. Biochem. J. 280:309-316(1 991) 

(5) Tomme P.. Chauvaux ft. Beguin P., Millet J., Aubert J,R, Claeyssens M. J. Biol. Chem. 266:10313-10318 
[ 6) Tomme P., van Beeumen J., Claeyssens M. Biochem. J. 285:319-324(1992). 
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[0796] 265. Glycosyl hydrolases family 1 signatures 

It has been shown [1 to 4] that the following glycosyl hydrolases can be, on the basis of sequence similarities, classified 
into a single family: - Beta-glucosidases (EC 3.2.1.21) from various bacteria such as Agrobacterium strain ATCC 21 400, 
Bacillus polyrnyxa. and Caldccellum saccharolyticum. - Two plants (clover) beta-glucosidases (EC 3.2.1.21) . - Two 
different beta-gatactosidases (EC 3.2.1.23 ) from the archaebacteria Sulfobbus sotfataricus (genes bgaS and lacS). - 
6-phospho-beta-galactosidases (EC 3.2.1.85 ) from various bacteria such as Lactobacillus casei, Lactococcus lactis, 
and Staphylococcus aureus. - 6-phospho-beta-glucosidases (EC 3.2.1.86) from Escherichia coli (genes bglB and ascB) 
and from Erwinia chrysanthemi (gene arbB). - Plants myrosinases (EC 3.2.3.1) (sinigrinase) (thioglucosidase). - Mam- 
malian lactase-phlorizin hydrolase (LPH) (EC 3.2.1.108 / EC 3.2.1.62 ). LPH. an integral membrane glycoprotein, is 
the enzyme that splits lactose in the small intestine. LPH is a large protein of about 1900 residues which contains four 
tandem repeats of a domain of about 450 residues which is evolutionary related to the above glycosyl hydrolases. One 
of the conserved regions in these enzymes is centered on a conserved glutamic acid residue which has been shown 
[5], in the beta-glucosidase from Agrobacterium, to be directly involved in glycoside bond cleavage by acting as a 
nucleophile. This region was used as a signature pattern. As a second signature pattern a conserved region was 
selected, found in the N-terminal extremity of these enzymes, this region also contains a glutamic acid residue. 
[0797] Consensus pattern: [LIVMFSTC}-[LIVFYS]-[UVHLIVMST]-E-N-G-[UVMFARHCSAGN] [E is the active site 
residue] 

Note: this pattern will pick up the last two domains of LPH; the first two domains, which are removed from the LPH 
precursor by proteolytic processing, have lost the active site glutamate and may therefore be inactive [4J. 
[0798] Consensus pattern: F-x-[FYWM]-[GSTA]-x-[GSTA]-x-fGSTA)(2)-[FYNH]-(NQ]-x-E-x-[GSTAJ- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Henrissat B. Protein Seq. Data Anal. 4:61-62(1991). 

[ 3] Gonzalez-Candelas L, Ramon D., Polaina J. Gene 95:31-38(1990). 

[ 4] El Hassouni M., Henrissat B., Chippaux M., Barras F. J. Bacterid. 174:765-777(1992). 

[5] Withers S.G., Warren R.A.J., Street I. P., Rupitz K., Kempton J.B., Aebersold R. J. Am. Chem. Soc. 112* 

5887-5889(1990). 

[0799] 266. Glycosyl hydrolases family 2 signatures 

It has been shown [1 ,2,E1] that the following glycosyl hydrolases can be, on the basis of sequence similarities, classified 
into a single family: - Beta-galactosidases (EC 3.2.1.23 ) from bacteria such as Escherichia coli (genes lacZ and ebgA), 
Clostridium acetobutylicum, Clostridium thermosulfurogenes, Klebsiella pneumoniae, Lactobacillus delbrueckii, or 
Streptococcus thermophilus and from the fungi Kluyveromyces lactis. - Beta-glucuronidase (EC 32.1.31 ) from Es- 
cherichia coli (gene uidA) and from mammals. One of the conserved regions in these enzymes is centered on a con- 
served glutamic acid residue which has been shown [3], in Escherichia coli lacZ, to be the general acidfoase catalyst 
in the active site of the enzyme. This region was used as a signature pattern. As a second signature pattern a highly 
conserved region was selected located some sixty residues upstream from the active site glutamate. 
[0800] Consensus pattern: N-x-[LIVMFYWD]-R-[STACN](2)-H-y-P-x(4)-[LIVMFYWS](2)-x(3)- [DN]-x(2)-G-[LIVM- 
FYW)(4)- 

Consensus pattern: |DENQLFJ-(KRVW]-N-[HRY]-[STAPV)-[SAC]-[LIVMFS](3)-W-[GS]. x(2,3)-N-E [E is the active site 
residue]- 

[ 1) Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Schroeder C.J., Robert C, Lenzen G., McKay L.L., Mercenier A. J. Gen. Microbiol. 137:369-380(1991). 
[ 3] Gebler J.C., Aebersold R, Withers S.G. J. Biol. Chem. 267:11126*11130(1992). 

[0801] 267. Glycosyl hydrolases family 3 active site 

It has been shown [1 ,2] that the following glycosyl hydrolases can be, on the basis of sequence similarities, classified 
into a single family: 

t 

Beta glucosidases (EC 3.2. 1 .21 ) from the fungi Aspergillus wentii (A-3), Hansenula anomala, Kluyveromyces f ra- 
gilis, Saccharomycopsis fibuligera, (BGL1 and BGL2), Schizophylium commune and Trichoderma reesei (BGL1). 
Beta glucosidases from the bacteria Agrobacterium tumefaciens (Cbg1 ), Butyrivibrio fibrisolvens (bgIA), Clostrid- 
ium thermocellum (bglB), Escherichia coli (bglX), Erwinia chrysanthemi (bgxA) and Ruminococcus albus. - Al- 
teromonas strain 0-7 beta-hexosaminidase A (EC 3.2.1.52). 
Bacillus subtiiis hypothetical protein yzbA. 

Escherichica coli hypothetical protein ycfO and HI0959, the corresponding Haemophilus influenzae protein. 
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[0787] Plant hemoglobins signature (globin2) 

»nH h f^ g,0b ![ ,S ' V ? « her "°P f ° teins Present in the root nodules of leguminoosplants. Leghemoglobinsare structurally 
and f unct^na.fy rotated to hemoglobh and myoglobia By providing oxygen to the barteroShey aTeLemS 

2T5T StrUCtUrally r6la,ed hem °9^ *«• the nodules of non-ieguminous plante f^ 

the roots of non-ncdu fating plant*) have bean recently sequenced. A signature pattern was developed UtofotteZ 
the sequence of plants hemoglobins, exclusively. P P 

[0788] Consensus pattern: [SN]-P-x-L-x(2)-H-A-x(3)-F- 

[ 1] Powell R., Gannon F. BioEssays 9:117-121(1988). 

[ 2] Kortt A.A.. Trinick M.J., Appleby C.A. Eur. J. Biochem. 175:141-149(1988) 

[ 3] Kortt AA, Inglis A.S.. Fleming A.I., Appleby C.A. FEBS Lett. 231:341-346(1988) 

[ 41 Bogusz D.. Appleby CA, Landsmann J., Dennis E.S.. Trinick M.J.. Peacock W.J. Nature 331:178-180(1988). 
[0789] 26Z Fructose-bisphosphate aldolase class-J active site (glycolytic_enz) 

P790] Friictose^fephosphate aldolase [1,2] is a glycolytic enzyme that catalyzes the reversible aldol cleavaqe or 
c™densa^crff^,ose-1,6-b^^ 

are two ctesses of fructose-bisphosphate aldolases with different catafytic mechanisms. Class I a.dE [3 mS 
found m h,gher eukaryotes, are homotetrameric enzymes which form a Schiff-base interm^iate beW^ the ? 2 
SSrr SUb ?T (dih ^ acet ™ Phosphate)and the epsilon-amino group^fSSlS^ 
vertebrates, three forms of tr» enzyme are found: aldolase A in muscle, aldolase B in Lr and aldose C in brain 

[0791] Consensus pattern: [UVMhx-[L.VMFYWhE-G-x.[LS]-L-K-P-[SN] [K is involved in Schiff-base formation]- 

[ 1] Perham R.N. Biochem. Soc. Trans. 18:185-187(1990). 

[ 2J Marsh J.J., Lebherz KG. Trends Biochem. Sci. 17:110-113(1992) 

[ 3] Freemont P.S., Dunbar ft, Fothergill-Gilmore LA. Biochem. J. 249:779-788(1988). 

[0792] 263. Glycosyl hydrolases family 1 1 active sites signatures 

The m«robial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases fEC 
12±4), cellobohydrolases (EC 3.2.1.91) (exoglucanases), or xylanases (EC 321 8) [1 21 FuSSdtaSSS 

be 'Si! ST, of TT C enzymes (ce,,ubses) and **™ s ** *** « ^^^SS^SSS^ Z 

^TtT^r £ I~ ,ami ' ieS " ^ 35 me Ce " Ulase G I 3 ] « ^ the gl yci7hy^s 

r f 3 ' S ^T* CUfrently tawwn to belon 9 ,0 this temi V are listed below. - ipergllus awS 

5: SELSST c SSrr -r-"™ 0 ^ and sub,i,is (xynA >- 

[0793] Consensus pattern: [PSAHLQ). X -E-Y-Y-[LIVM](2HDE]-x-[FYWHN] [E is an active site residuel- 
Consensus pattern: [LIVMF]-x(2)-E-[AG]-[YWG]-[QRFGSHSG]-[STAN]-G-x-[SAFJ [E is an a^S resktue,- 

[ 1] Beguin P. Annu. Rev. Microbiol 44:219-248(1990) 

! o! u!^!^ IT" 588 ' B "» ' bUm D G - Mi " er R C - Jf - Warren R A J - Mic robiol. Rev. 55:303-315(1991) 

3 Hennssat B. Claeyssens M., Tomme P.. Lemesle L. Mornon J.-P. Gene 81:83-95(1989) 
[ 4) Hennssat B. Biochem. J. 280:309-316(1 991 ) 

mTwwF*" H ' M ° r * ama H " Shinmy ° A ' Hate Y - Ka,sube Y> UrabB '- 0kada H Biochem - J 2sa 

[0794] 264. Gfycosyl hydrolase family 14 
[0795] This family are beta amylases. 
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[ 2] Gilkes N.R., Henrissat B., Kilbum D.G.. Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
[ 3] Henrissat B., Claeyssens M., Tomme R, Lemesle L. Mornon J.-R Gene 81:83-95(1989). 
[ 4) Henrissat B. Biochem. J. 280:309-316(1 991 y 

[5] Tomme P., Chauvaux S., Beguin R, Millet J. t Aubert J.-R. Claeyssens M. J. Biol. Chem. 266" 1031 3- 1031 8 
(1991). * 

[ 6) Tomme P., van Beeumen J., Claeyssens M. Biochem. J. 285:319-324(1992). 
[0779] 258. Matrix protein (MA), p15 (GAG_ma) 

[0780] The matrix protein, p15, is encoded by the gag gene. MA is involved in pathogenicity [1]. 
[0781] [1 J : Pozsgay JM, Beilharz MW, Wines BD, Hess AD, Pitha PM, J Virol 1993;67:5989-5999. 
[0782] 259. Gag polyprotein, inner coat protein p12 (GAG_P12) 

[0783] The retroviral p1 2 is a virion structural protein. p1 2 is proline rich. The function carried out by p12 in assembly 

and replication is unknown, pi 2C is associated with pathogenicity of the virus 

[1] Pozsgay JM. Beilharz MW, Wines BD, Hess AD. Pitha PM, J Virol 1993;67:5989-5999. 

[0784] 260. Glutamine synthetase signatures (GLN-SYNT) 

Glutamine synthetase (EC 6.3.1.2 ) (GS) [1] plays an essential role in the metabolism of nitrogen by catalyzing the 
condensation of glutamate and ammonia to form glutamine. There seem to be three different classes of GS [2.3,4]: - 
Class I enzymes (GSI) are specific to prokaryotes, and are oligomers of 12 identical subunits. The activity of GSI-type 
enzyme is controlled by the adenylation of a tyrosine residue. The adenylated enzyme is inactive. - Class II enzymes 
(GSM) are found in eukaryotes and in bacteria belonging to the Rhizobiaceae, Frankiaceae, and Streptomycetaceae 
families (these bacteria have also a class-l GS). GSII are octamer of identical subunits. Plants have two or more 
isozymes of GSII, one of the isozymes is translocated into the chloroplast. - Class III enzymes (GSIII) has, currently 
only been found in Bacteroides fragilis and in butyrivibrio fibrisolvens. It is a hexamer of identical chains. It is much 
larger (about 700 amino acids) than the GSI (450 to 470 amino acids) or GSII (350 to 420 amino acids) enzymes. 
While the three classes of GS's are clearly structurally related, the sequence similarities are not so extensive. As 
signature patterns three conserved regions were selected. The first pattern is based on a conserved tetrapeptide in 
the N-terminai section of the enzyme, the second one is based on a glycine-rich region which is thought to be involved 
in ATP-binding. The third pattern is specific to class I glutamine synthetases and includes the tyrosine residue which 
is reversibly adenylated. 

[0785] Consensus pattern: [FYWL]-D-G-S-S-x(6,8)-[DENQSTAKJ-[SA]-[DE]-x(2)-[LIVMFY]- 

Consensus pattern: K-P-[LIVMFYA]-x(3 ( 5)-[NPATJ-G-[GSTAN}-G-x-H-x(3)-S- 

Consensus pattern: K-[LIVM]-x(5)-{LIVMA]-D-[RK]-[DN]-[LI]-Y [Y is the site of adenylation]- 

[ 1] Eisenberg D., Almassy R.J., Janson C.A., Chapman M.S., Suh S.W., Cascio D., Smith W.W. Cold Spring 
Harbor Symp. Quant. Biol. 52:483-490(1987). 

[ 2] Kumada Y, Benson D.R., Hillemann D. t Hosted T.J., Rochefort D.A.. Thompson C.J.. Wohlleben W.. Tateno 

Y Proc. Natl. Acad. Sci. U.S.A. 90:3009-3013(1993). 

[ 3] Shatters R.G., Kahn M.L J. Mol. Evol. 29:422-428(1989). 

[ 4] Brown J.R., Masuchi Y, Robb F.T., Doolittle W.F. J. Mol. Evol. 38:566-576(1994). 

[0786] 261. Globins profile (gtobinl) 

Globins are heme-containing proteins involved in binding and/or transporting oxygen [1]. They belong to a very large 
and well studied family which is widely distributed in many organisms. The major groups of globins are: - Hemoglobins 
(Hb) from vertebrates. Hb is the protein responsible for transporting oxygen from the lungs to other tissues. It is a 
tetramer of two alpha and two beta chains. Most vertebrate species also express specific embryonic or fetal forms of 
hemoglobin where the alpha or the beta chains are replaced by a chain with higher oxygen affinity, as for the gamma, 
delta, epsilon and zeta chains in mammals, for example. - Myoglobins (Mg) from vertebrates. Mg is a monomeric 
protein responsible for oxygen storage in muscles. - Invertebrate globins [2]. A wide variety of globins are found in 
invertebrates. Molluscs generally have one or two muscle globins which are either monomeric ordimeric. Insects, such 
as the midge Chironomus thummi, have a large set of extracellular globins. Nematodes and annelids have a variety 
of intracellular and extracellular globins; some of them are multi- domain polypeptides (from two up to nine-domain 
globins) and some produce large, disulfide-bonded aggregates. - Leghemoglobins (Lg) from the root nodules of legu- 
minous plants. Lg provides oxygen for bacteroids. - Flavohemoproteins from bacteria (Escherichia coli hmpA) and 
fungi [3]. These proteins consist of two distinct domains: an N-terminal globin domain and a C-terminal FAD-containing 
reductase domain. In bacteria such as Vitreoscilla, the enzyme-associated globin is a single domain protein. All these 
globins seem to have evolved from a common ancestor. The profile developed to detect members of the globin family 
is based on a structural alignment of selected globin sequences 

[ 1) Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New- York (1 988).[ 2] Goodman M., 
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acting as a nucleophile. This region was used as a signature pattern. As a second signature pattern we selected a 
conserved region, found in the N-terminal extremity of these enzymes, this region also contains a glutamic acid residue 
[0769] Consensus pattem[LI\^FSTC]-[LIVFYSHLI^4LI\^ST]-E^-[U\^FAR].[CSAGN] [E is the active site 
residue] Sequences known to belong to this class detected by the pattern ALL 

[07701 Note: this pattern will pick up the last two domains of LPH; the first two domains, which are removed from the 

Precursor by proteolytic processing, have lost the active site glutamate and may therefore be inactive [41 
[0771] Consensus pattemF-x^FYWMHGSTA]-x-[GSTAl-x-[GSTA](2)-[FYNH].[NQ]-x-E-x-[GSTA] Sequences 
known to belong to this class detected by the pattern ALL 
[0772] Note: this pattern will pick up the last three domains of LPH. 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Henrissat B. Protein Seq. Data Anal. 4:61-62(1991). 

[ 3] Gonzalez-Candelas L, Ramon D., Polaina J. Gene 95:31-38(1990). 

[ 4] El Hassouni M., Henrissat B., Chippaux M., Barras F. J. Bacterid. 174:765-777(1992) 

[5] Withers S.G., Warren RAJ., Street I.P., Rupitz K., Kempton J.B., Aebersold R. J. Am Chem Soc 112 

5887-5889(1990). 

[0773] 256. Glyco_hydro_20 

Glycosyl hydrolase family 20 

Previous Pfam IDs: glycosyLhydr11; 

Number of members: 33 

[0774] 257. (Glyco_hydro_9) 

Glycosyl hydrolases family 9 active sites signatures 

(aka Glycosyl_hydr12) 

I 7^ G mi W de9radatk>n of cellulose and x V ,ans squires several types of enzymes such as endoglucanases 
(EC 3.2.1.4), celtobiohydrolases (EC 3.2.1.91) (exoglucanases), or xylanases (EC 3.2.1.8) [1,2]. Fungi and bacteria 
produces a spectrum of cellublytic enzymes (celluteses) and xylanases which, on the basis of sequence similarities 
can be classif iedinto families. One of these families is known as the cellulase family E [3] or as the glycosyl hydrolases 
Tamny 9 [4,E1]. The enzymes which are currently known to belong to this family are listed below. 

Butyrivibrio fibrisolvens cellodextrinase 1 (cedl). 

- Cellulomonas fimi endoglucanases B (cenB) and C (cenC). 
Clostridium cellulolyticum endoglucanase G (celCCG). 

- Clostridium cellulovorans endoglucanase C (engC). 

- Clostridium stercoararium endoglucanase 2 (avicelase I) (celZ). 

- Clostridium thermocellum endoglucanases D (celD) t F (celF) and I (ceh ). 
Fibrobacter succinogenes endoglucanase A (endA). 
Pseudomonas fluorescens endoglucanase A (celA). 

Streptomyces reticuli endoglucanase 1 (ceil). 
Thermomonospora fusca endoglucanase E-4 (celD). 

- Dictyostelium discoideum spore germination specific endoglucanase 270-6. This slime mold enzyme may dioest 
the spore cell wall during germination, to release the enclosed amoeba. 

- Endoglucanases from plants such as Avocado or French bean. In plants this enzyme may be involved the fruit 
ripening process. 

'k 7761 rclT ° f me mOSt C ° nServed re 9 ions in mese enzymes are centered on conserved residues which have been 
shown [5.6), rn the endoglucanase D from Cellulomonas thermocellum, to be important tor the catalytic activity. The 
first region conta.ns an active site histidine and the second region contains two catalytically important residues- an 
aspartate and a glutamate. Both regions were used as signature patterns 

[0777] Consensus pattern [STV]-x-(LIVMFY]-[STV]-x(2)-G-x-[NKR]- X (4)^{PLIVM]-H-x-R [H is an active site residue] 
Sequences known to belong to this class detected by the pattern ALL, except for Cellulomonas fimi cenC and Strep- 
tomyces reticuli cell. K 

[0778] Consensus pattern [FYWJ-x-D-x(4HFYW]-x(3)-E-x-ISTA]-x(3)-N-[STA] [D and E are active site residues] Se- 
quences known to belong to this class detected by the pattern ALL. except for Fibrobacter succinogenes endA whose 
sequence seems to be incorrect. wnose 



[ 1) Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 
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[0759] Prokaryotic. eukaryotic PG and exoPG share a few regions of sequence similarity. The best conserved of 
these regions was selected. It is centered on a conserved histidine most probably involved in the catalytic mechanism 
[«J. 

[0760] Consensus pattem[GSDENKRH]-x(2>-[VMFC]-x(2)-[GS]-H-G-[LI VMAG]-x(1 ,2)-[U VM]-G-S [H is the putative 
active site residue] Sequences known to belong to this class detected by the pattern ALL 
[0761] Note: these proteins belong to family 28 in the classification of grycosyl hydrolases [5]. 

1 1] Ruttowski E., Labitzke R., Khanh N.Q., Loeffler R, Gottschalk M.. Jany K.-D. Biochim. Biophys. Acta 1087: 
104-106(1990). 

[ 2) Huang J., Schell M.A. J. BacterioL 172:3879-3887(1990). 

[ 3] He S.Y., Collmer A. J. BacterioL 172:4988-4995(1990). 

[ 4] Bussink H.J.D., Buxton F.P, Visser J. Curr. Genet. 19:467-474(1991). 

1 5] Henrissat B. Biochem. J. 280:309-316(1991). 

[0762] 254. (Gryco_hydro_32) 
Gtycosyl hydrolases family 32 active site 

[0763] It has been shown [1 ,2] that the fol towing grycosyl hydrolases can be classified into a single family on the 
basis of sequence similarities: 

- Inulinase (EC 3.2.1 .7) (or inulase) from the fungi Kluyveromyces marxianus. 

Beta-f ructofuranosidase (EC 3.2. 1 .26), commonly known as invertase in fungi and plants and as sucrase in bacteria 
(gene sacA or scrB). 

- Raffinose invertase (EC 3.2.1 .26) (gene rafD) from Escherichia coli plasmid pRSD2. 

- Levanase (EC 3.2.1.65) (gene sacC) from Bacillus subtilis. 

[0764] One of the conserved regions in these enzymes is located in the N-terminal section and contains an aspartic 
acid residue which has been shown [3], in yeast invertase to be important for the catalytic mechanism. This region was 
used as a signature pattern. 

[0765] Consensus pattern H-x(2)-P-x(4)-[LIVM]-N-D-P-N-G [D is the active site residue] Sequences known to belong 
to this class detected by the pattern ALL 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Gunasekaran P., Karunakaran T. Cami B., Mukundan A.G., Preziosi L., Baratti J. J; BacterioL 172:6727-6735 
(1990). 

[ 3] Reddy V.A, Maley F. J. Biol. Chem. 265:10817-10120(1990). 

[0766] 255. (Glyco_hydro_1) 
Glycosyl hydrolases family 1 signatures 

[0767] It has been shown [1 to 4] that the following glycosyl hydrolases can be ? on the basis of sequence similarities, 
classified into a single family: 

Beta-glucosidases (EC 3.2.1 .21) from various bacteria such as Agrobacterium strain ATCC 21400, Bacillus poty- 

myxa, and Caldocellum saccharolyticum. 

Two plants (clover) beta-glucosidases (EC 3.2.1.21). 

- Two different beta-galactosidases (EC 3.2. 1 .23) from the archaebacteria Sulfotobus solfataricus (genes bgaS and 
lacS). 

6-phospho-beta-galactosidases (EC 3.2. 1 .85) from various bacteria such as Lactobacillus casei, Lactococcus lac- 
tis ( and Staphylococcus aureus. 

- 6-phospho-beta-glucosidases (EC 3.2. 1 .86) from Escherichia coli (genes bglB and ascB) and from Erwinia chry- 
santhemi (gene arbB). 

Plants myrosinases (EC 3.2.3. 1 ) (sinigrinase) (thioglucosidase). 

Mammalian lactase-phlorizin hydrolase (LPH) (EC 3.2.1.108/ EC 3.2. 1 .62). LPH, an integral membrane glycopro- 
tein, is the enzyme that splits lactose in the small intestine. LPH is a large protein of about 1900 residues which 
contains four tandem repeats of a domain of about 450 residues which is evolutionary related to the above glycosyl 
hydrolases. 



[0768] One of the conserved regions in these enzymes is centered on a conserved glutamic acid residue which has 
been shown [5], in the beta-glucosidase from Agrobacterium, to be directly involved in glycosidic bond cleavage by 
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[0748] 251. (Glyco_hydro_17) 
Glycosyt hydrolases family 17 signature 
(aka glycosyl_hydro4) 

E?m U ^ b8Sn ^ [1 ' 21 ** <0ll0Win9 1 * C08y ' hvdrola ** can be classified into a single family on the 
basis of sequence similarities: a ' OT me 

maybe invoked in the defense of plants agahst pathogens through its ability to degrade fun£l cell waif 
• Glucan 1,3-beta-glucoskJase (EC 3.2.1.58) (exo-(1->3)-beta-glucanase) from yeast (gene BGL2) This enzvme 

s^rE. 6XPanSi0n ^ 9fOW,h ' " Ce,, " e " ,USi0n dUri " 9 ™« n * » ^e rebate Zg 

- Lichenases (EC 3.2.1 .73) (endo-(1->3,1->4)-beta-glucanase) from various plants. 

[0751 J Consensus pattern [LIVM]-x-[UVMFYWA](3)-[STAGVE-[STAl-G-W-P-tSTNl x r<?Afini if i* * 
idue] Sequences lo,own to betong to mis class detected by the S 

[ 1J Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Ori N., Sessa G., Lotan T., Himmelhoch S., Fluhr R. EMBO J. 9:3429-3436(1990) 

SS^gX" ^ TP J t C ° lman P M " ° hen U H ° j PJ - FhCher G B * ^ *** ML U.SA 91: 

[0752J 252. (Glyco_hydro_3) 
Glycosyl hydrolases family 3 active site 

^^.X^ * ,hat the ,ol,owin9 *»" hydro,ases ran * " ,he tesis - 

" 5? £^ idaSeS (EC . 3 - 2 K 1 21 > ,rom «» » u "9i Aspergillus wentii (A-3). Hansenuta anomala. K.uyveromyces fra- 
gHis Saccharomycopsis fibuhgera. (BGL1 and BGL2), Schizophyl.um commune and rnMm^S^u\ 

■ Beta glucosKteses from the bacteria Agrobacterium tumefaciens (Cbg1 ). Butyrivbrio fibriscWfbqS^ £L 
lumthermoce.lum (bgIB), Escherichia coli (bg.X). Erwinia ^ M e J (tZ) ^ B^ZS^s 

- Alteromonas strain 0-7 beta-hexosaminidase A (EC 3.2. 1 .52). nummococcus albus. 

- Bacillus subtilis hypothetical protein yzbA. 

- Escherichia coli hypothetical protein ycfO and HI0959, the corresponding Haemophilus influenzae protein. 
KfL °" e ™ the A COnsefved re 9 ions in enzymes is centered on a conserved aspartic acid residue which has 

[0755] Consensus pattern[LIVMJ(2)-[KR]-x-[EQiq-x(4)-G-[LIVMFTHLI VT]-[LIVMF]- |ST]-D-x<2MSGADNn tD h .h„ 
active s,te residue] Sequences known to belong to this class detected by the ^pattemALL ' 1 

[ 1) Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Castle LA, Smith K.D., Morris R.O. J. Bacterid. 174:1478-1486(1992) 
[ 3] Bause E., Legler G. Biochim. Biophys. Acta 626:459-465(1 980). 

[0756J 253. (Glyco_hydro_28) 
Polygalacturonase active site (aka PG) 

lEJ^JTT Ur0naSe (E ° 3 2 V15) (PG) <P eclinase > catalyzes the random hydrolysis of 1 4-alpha-D-oa- 
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collectively termed MAGUKs (membrane-associated guanylate kinase homotogs) [5): - Drosophila lethai{1 )discs large- 
1 tumor suppressor protein (gene dlg1 ). This protein is associated with septate junctions in developing flies and defects 
in the dlgl gene cause neoplastic overgrowth of the imaginal disks. - Mammalian tight junction protein Zo-1 . - A family 
of mammalian synaptic proteins that seem to interact with the cytoplasmic tail of NMD A receptor subunits. This family 
currently consist of SAP907PSD-95, C HAPSYN- 1 1 0/PSD-93, SAP97/DLG1 and SAP102. - Vertebrate 55 Kd erythro- 
cyte membrane protein (p55). p55 is a palmrtoylated, membrane-associated protein of unknown function. - Caenorhab- 
ditis elegans protein lin-2. which may play a structural role in the induction of the vulva - Rat protein CASK. - Human 
protein DLG2, - Human protein DLG3.There is an ATP-binding site (P-loop) in the N-terminal section of GK. This region 
is not conserved in the GK-like domain of the above proteins which are therefore unlikely to be kinases. However these 
proteins retain the residues known, in GK, to be involved in the binding of GMR As a signature pattern a highly conserved 
region was selected that contains two arginine and a tyrosine which are involved in GMP-binding 
[0738] Consensus pattern: T-[ST]-R-x(2)-[KR]-x(2)-[DE]-x(2)-G-x(2)-Y-x-[FY]-[LIVMK}- 

[ 1J Stehle I, Schulz G.E. J. Mol. Biol. 224:1127-1141(1992). 
[ 2] Bryant P.J., Woods D.F. Cell 68:621-622(1992). 
[ 3] Goebl M.G. Trends Biochem. Sci. 17:99-99(1992). 

[ 4) Zschocke P.D., Schiltz E., Schulz G.E. Eur. J. Biochem. 213:263-269(1993). 
[ 5] Woods D.F., Bryant P.J. Mech. Dev. 44:85-89(1994). 

[0739] 249. (Glyco_hydro_35) 

Glycosyl hydrolases family 35 putative active site 

[0740] Beta-galactosidases (EC 3.2.1 .23) from mammals, fungi, plants and the bacteria Xanthomonas manihotis are 
evolutionary related [1,2]. They belong to family 35 in the classification of glycosyl hydrolases [3,E1]. 
[0741] Mammalian beta-galactosidase is a lysosomal enzyme (gene GLB1) which cleaves the terminal galactose 
from gangliosides, glycoproteins, and glycosaminoglycans and whose deficiency is the cause of the genetic disease 
Gm(1) gangliosidosis (Morquio disease type B). 

[0742] On of the best conserved regions in these enzymes contains a glutamic acid residue which, on the basis of 
similarities with other families of glycosyl hydrolases [4], probably acts as the proton donor in the catalytic mechanism. 
This region wss used as a signature pattern. 

[0743] Consensus pattern: G-G-P-[LIVM](2)-x(2)-Q-x-E-N-E-(FY] [The second E is the putative active site residue] 
Sequences known to belong to this class detected by the pattern ALL. 

[ 1] Taron C.H., Benner J.S., Hornstra L.J., Guthrie E P. Glycobiology 5:603-610(1995). 

[ 2] Carey A.T., Holt K., Picard S., Wilde R., Tucker G.A., Bird C.R., Schuch W., Seymour G.B. Plant Physiol 108' 
1099-1107(1995). 

[ 3] Henrissat B., Bairoch A. Biochem. J. 293:781-788(1993). 

[ 4] Henrissat B., Callebaut I., Fabrega S., Lehn P., Mornon J.-R, Davies G. Proc. Natl. Acad Sci USA. 92" 
7090-7094(1995). 

[0744] 250. (Gfyco_hydro_16) 
Glycosyl hydrolases family 16 signature 

[0745] It has been shown [1] that the following glycosyl hydrolases can be classified into a single family on the basis 
of sequence similarities: 

- Bacterial beta-1 ,3-1 ,4-glucanases, or lichenases, (EC 3.2.1.73) mainly from Bacillus but also from Cbstridium 
thermocellum (gene licB), Fibrobacter succinogenes and Rhodothermus marinus (gene bgIA). 
Bacillus circulans beta-1,3-glucanase A1 (EC 3.2.1.39) (gene glcA). 
Lamarinase (EC 3.2.1.6) from Clostridium thermocellum (gene lam1). 
Streptomyces coelicolor agarase (EC 3.2.1 .81 ) (gene dagA). 
Alterornonas carrageenovora kappa-carrageenase (EC 3.2. 1 .83) (gene cgkA). 

[0746] Two closely clustered conserved glutamates have been shown [2] to be involved in the catalytic activity of 

Bacillus licheniformis lichenase. The region that contains these residues as a signature pattern was used. 

[0747] Consensus pattern E-[LIV]-D-[LIV]-x(0,1 )-E-x(2)-[GQ]-[KRNF]-x-[PSTA] [The two E's are active site residues] 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2) Juncosa M., Pons J., Dot T, Quero! E., Planas A. J. Biol. Chem. 269:14530-14535(1994). 
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[0730] The proteins known to belong to this family are: 

- Glypican 1 (GPC1). 

- Glypican 2 (GPC2) or cerebroglycan. 

- Glypican 3 (GPC3) or OCI-5. In man. defects in GPC3 are the cause of a X-linked genetic disease Simoson- 
Galabi-Behmel syndrome (SGBS). ' ^ 
K-glypican. 

- Glypican 5 (GPC5). 

- Drosophila protein dally. 

[0731] The signature pattern that was developed for glypicans is located in the central section of the extracellular 
domain and contains five of the conserved cysteines. 

[0732] Consensus pattemC-x(2)-C-x-G-[LI VM^x(4)-P-C-x(2)^FYJ-C-x(2)-[UVM]-x(2)- G-C [The C's are probably in- 
volved n a disulfide bonds] Sequences known to belong to this class detected by the pattern ALL, except for dally. 

[ 1) Weksberg R., Squire J.A., Templeton D M. Nat. Genet. 12:225-227(1996). 
[ 2] Watanabe K„ Yamada H., Yamaguchi Y. J. Cell Biol. 130:1207-1218(1995). 

[0733] 246. Granins signatures 

Granins (chromogranins or secretogranins) [1] are a family of acidic proteins present in the secretory granules of a 
Z^Z k^k end0Crine ™? " euro ^ ndocrine cells The exact function(s) of these proteins is not yet known but they 
rZZ k P^ffo* °< ^logically active peptides and/or they may act as helper proteins in the packaging of 
peptide hormones and neuropeptides. Three members of this family of proteins show some sequence similarities - 

j k-? A) , [2J - C6A 18 " Pr0tei " °' abOUt 420 reSidU6S; A is the P recursor - *. peptide pancreSSin 
S? nr T 9V r ^ ! 9^f ose :^ duced insuli " Please from the pancreas. - Secretogranin 1 (chromogranin B). A sul- 
fated protein of about 600 rescues. - Secretogranin 2 (chromogranin C). A sulfated protein of about 650 residues 
Apart from their subcellular location and the abundance of acidic residues(Asp and Glu). these proteins do not share 
many structural similarities Only one short region, located in the C-terminal section, is conserved in all these prote^s 
Ch omogran.ns A and B share a region of high similarity in their N-terminal section; this region includes two cysteine 
residues involved in a disulfide bond lwo c y slB,ne 

[0734] Consensus pattern: [DEHSN]-L-[SAN]-x(2)-[DE)-x-E-L- 

[ 1) Huttner W.B., Gerdes H.-H.. Ftosa P. Trends Biochem. Sci. 16:27-30(1991). 
[ 2) Simon J.-P., Aunis D. Biochem. J. 262:1-13(1989). 

[0735] 247. grpE protein signature 

In prokaryotes the grpE protein [1] stimulates, jointly with dnaJ. the ATPase activity of the dnaK chaperone It seems 
to accurate the release of ADP from dnaK thus allowing dnaK to recycle more eliently. GrpE is a'proTetn of aoou 
mSL5£ h V ?k ' ™ eVOlut,ona, y rela,ed mitochondrial proteinfgene GRPE) has been shown [2] to associate with 
UTT hSp70pr ° te ' n and 10 thus ^ a role in lhe im Pot <* P***. from the cytoplasm. As a signage 
P «™ ' m0St conserved re 9 ion of OT* was selected. It is located in the C-terminal section 
Ks^Ti Pattem: [FLHDN H PH EA]-x(2HH M] -x-A- l LIV M T N ]-x(16.20)^-(FY h x(3HDEG]-x(2)-[L.VM]- 

[ 1J Georgopoutos C„ Welch W. Annu. Rev. Cell Biol 9 60 1-635(1 993) 

o%^i , ss;ss^ a& " Geor9opou,os c - jenoe p - Kronktou n - Horet m - n - «•* 

[0737] 248. Guanylate kinase signature and profile 

Guanylate kinase (EC 2^8) (GK) [1 ] catalyzes the ATP-dependent phosphorylation of GMP into GDP. It is essential 
for recycle GMP and md.rectly. cGMP. In prokaryotes (such as Escherichia coli). lower eukaryotes (such as Tast 
and in vertebrates, GK is a highly conserved monomeric protein of about 200 amino acids. GK has been shown [^3^1 
to be structurally similar to the following protem: - Protein A57R (or SalG2R) from various strains of Vaccinia virus 
This protein B ; highly sim.lar to GK. but contains a frameshift mutatk>n in the N-terminal section and could therefore be 

of the DHR domain, a SH3 domain (see <PDOC50002> as well as a C-termina. GK-.ike domain, these protein are 
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[0721] [1] Medline: 95252686. A family of UDP-GlcNAc/MurNAc: polyisoprenol-P GlcNAc/MurNAc-1-P transferases. 

Lehrman MA; Grycobioiogy 1994;4:768-771. 

[0722] 241. Gfycosyl hydrolases family 15. 21 members. 

[0723] 242. Glycosyl hydrolases family 16 signature 

It has been shown (1) that the following glycosyl hydrolases can be classified into a single family on the basis of 
sequence similarities: - Bacterial beta-1,3-1,4-glucanases. or lichenases, (EC 3.2.1.73 ) mainly from Bacillus but also 
from Clostridium thermocellum (gene TcB), Fibrobacter succinogenes and Rhodothermus marinus (gene bgIA). - Ba- 
cillus circulans beta-1 ,3-glucanase A1 (EC 3.2.1.39 ) (gene gteA). - Lamarinase (EC 3.2.1.6 ) from Clostridium thermo- 
cellum (gene Iam1). - Streptomyces coelicolor agarase (EC 3.2.1.81) (gene dagA). - Alteromonas carrageenovora 
kappa-carrageenase (EC 3.2.1.83 ) (gene cgkA).Two closely clustered conserved glutamates have been shown |2J to 
be involved in the catalytic activity of Bacillus licheniformis lichenase. The region was used that contains these residues 
as a signature pattern. 

[0724] Consensus pattern: E-[LIV]-D-[LIV]-x(0,1)-E-x(2)-[GQ]-[KRNFJ-x-[PSTA] [The two E's are active site resi- 
dues]- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Juncosa M., Pons J., Dot T, Querol E., Planas A. J. Biol. Chem. 269:14530-14535(1994). 
[0725] 243. Glycosyl hydrolases family 1 7 signature 

It has been shown [1,2) that the following glycosyl hydrolases can be classified into a single family on the basis of 
sequence similarities: - Glucan endo-1,3-beta-g!ucosidases (EC 3.2.1.39 ) (endo-(1->3)-beta- glucanase) from various 
plants. This enzyme may be involved in the defense of plants against pathogens through its ability to degrade fungal 
cell wall polysaccharides. - Glucan 1 ,3-beta-glucosidase (EC 3.2.1.58 ) (exo-(1 ->3)-beta-glucanase) from yeast (gene 
BGL2). This enzyme may play a role in cell expansion during growth, in cell-cell fusion during mating, and in spore 
release during sporulation. - Lichenases (EC 32.1.73) (endo-(1->3.1->4)-beta-glucanase) from various plants. The 
best conserved region in the sequence of these enzymes is located in their central section. This region contains a 
conserved tryptophan residue which could be involved in the interaction with the glucan substrates (2] and it also 
contains a conserved glutamate which has been shown [3] to act as the nucleophile in the catalytic mechanism, this 
region was used as a signature pattern. 

Consensus pattern: [LIVM]-x-[LIVMFYWA](3)-[STAG)-E-[STAl-G-W-P-[STN]-x-[SAGQ] [E is an active site residue]- 
[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2) Ori N., Sessa G., Lotan T, Himmelhoch S., Fluhr R. EMBO J. 9:3429-3436(1990). 

[ 3] Varghese J.N., Garrett T.P.J., Colman P.M., Chen L, Hoj P.J., Fincher G.B. Proc. Natl. Acad. Sci USA. 91* 
2785-2789(1994). 

[0726] 244. Glyoxalase I signatures 

Glyoxalase I (EC 4.4.1.5) (lactoylglutathione lyase) catalyzes the first step of the glyoxal pathway, the transformation 
of methylglyoxal and glutathioneinto S-lactoylglutathione which is then converted by glyoxalase II to lactic acid [1]. 
Glyoxalase I is an ubiquitous enzyme which binds one mole of zinc per subunit. The bacterial and yeast enzymes are 
monomeric while the mammalian one is homodimeric. The sequence of glyoxalase I is well conserved. In bacteria and 
mammals, the enzyme is a protein of about 1 30 to 1 80 residues while in fungi it is about twice longer. In these organisms 
the enzyme is built out of the tandem repeat of an homologous domain. Two signature patterns for this family were 
derived. The first one is located in the N-terminal region while the second one is located in the central section of the 
protein and contains a conserved histidine that could be implicated in the binding of the zinc atom. 
[0727] Consensus pattern: IHQ]-[l\n>x-[LIVFY]-x-[l^ 

Consensus pattern: G-[NTKQ]-x(0,5HGA]-[LVFY]-[GH]-H-[IVFHCGA)-x-[STAGLE]-x(2)-[DNC)- 
[0728] ( 1] Kim N.-S., Umezawa Y, Ohmura S., Kato S. J. Biol. Chem. 268:11217-11221(1993). 
[0729] 245. (Glypican) 
Gfypicans signature 

Glypicans [1,2] are a family of heparan sulfate proteoglycans which are anchored to cell membranes by a gfycosyl- 
phosphatidylinositol (GPI) linkage. Structurally, these proteins consist of three separate domains: 

a) A signal sequence; 

b) An extracellular domain of about 500 residues that contains 1 2 conserved cysteines probably involved in disulfide 
bonds and which also contains the sites of attachment of the heparan sulfate glycosaminoglycan side chains; 

c) A C -terminal hydrophobic region which is post-translationally removed after formation of the GPI -anchor. 
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[0703] Consensus pattern[KR]^GSAT]-x(4)4FYWUH]-[DQ^K]-x-P-x4UVMFY]-xf3)-H-x(2WAGl H fUVMi 
quences known to belong to this class detected by the pattern ALL [AG]-H-fUVMJ Se- 

[0704] Note: these proteins belong to family M22 in the classification of peptidases [2.E1J. 

[ 1) Abdullah K.M., Lo R.Y.C., Meltors A. J. Bacteriol. 173:5597-5603(1991) 
[ 2] Rawlmgs N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1 995). 

[0705] 235. (Glucosamine_iso) 

Glucosamine/galactosamine-6-phosphate isomerases signature 

n^ZLST * 11 ' E * 5henchB coh hypothetical protein yieK. - Bacillus subtilis hypothetical protein vbfTAs 

[0706] Consensus pattern: [LIVM>x(3VG-x-[LIT]-x-{LIV>x-[UVM]-x-G-[LIVMhG-x. [DENJ-G-H- 

(IWsT G " FOnteS M R M " Garran Aftamiran ° M ° alCa9n0 M L ' HOrja ' eS E S,fUCtUre 3:1 323-1332 
[ 2] Reizer J., Ramseier T.M., Reizer A., Charbit A., Saier M.H. Jr. Microbiology 142231 -250(1 996). 

[0707] 236. Pneumovirus attachment glycoprotein G (glycoprotein G) 

ESL a ThiS ,amily in !' UdeS a,,achmen, P ro,eins " om respiratory synctial virus. Glycoprotein G has not been shown 
^^w,r' n,daSe ° r hema ^ ,utinin «** (Swiss-Prot). The amino termLTis thought* 

[0712] 238. Glycosyl transferases (Glycos_transf_2) 
Thymidine and pyrimidine-nudeoside phosphorytases signature 

SSJ^KSS^^ < EC ^ to« »*> H is an enzyme evolutionary and 

ELymeT^eS 89 " °' " reSk,U6S ^ h ^ °' P' oteins si 9— Pattern for 

Kete^ 

iiS^^r W ^ ^ K0SZa,ka G W - Kreni,Sky TA ' E ^ k S E J Bi01 - Ch - 255: 
[ 2J Furukawa T Yoshimura A., Sumfcawa T.. Haraguchi M., Akiyama S,... Fukui K., Yamada Y Nature 356:668-668 

[ 3) Saxild H.H.. Andersen LN.. Hammer K. J. Bacteriol. 178:424-434(1996). 

[0720] 240. Glycos_transt_4. Glycosyl transferase. Number of members. 44. 
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disulfide bonds]- 

[1] Bruix M„ Jimenez M.A., Santoro J. ( Gonzalez C, Colilla F.J., Mendez E., Rico M. Biochemistry 32715-724 
(1993). 

[2] Gu Q., Kawata E.E., Morse M.-J., Wu H.-M M Cheung A.Y. Mol. Gen. Genet. 234:89-96(1992). 

[3] Terras F.R.G., Torrekens S., van Leuven R, Osbom R.W., Vanderieyden J.. Cammue B.P.A., Broekaert WF 

FEBS Lett. 316:233-240(1993). 

(4] Btoch C. Jr., Richardson M. FEBS Lett 279:101-104(1991). 

[5] Ishibashi N., Yamauchi D.. Miniamikawa T. Plant Mol. Biol. 15:59-64(1990). 

[7] Choi Y, Choi YD., Lee J.S. Plant Physiol. 101:699-700(1993). 

[0689] 231. Gelsolin. Gelsolin repeat. Number of members: 170 

[0690] [1]Medline: 97433077. The crystal structure of plasma gelsolin: implications for actin severing, capping, and 
nucleation. Burtnick LD ( Koepf EK, Grimes J, Jones E Y, Stuart Dl, McLaughlin P J. Robinson RC; Cell 1 997;90:661 ^670. 
[0691] 232. Germin family signature 

Germins [1] are a family of homopentameric cereal glycoproteins expressed during germination which may play a role 
in altering the properties of cell walls during germinative growth. It has been shown that wheat and barfeygermins act 
as oxalate oxidases (EC 1.2.3.4) . an enzyme that catalyzes the oxidative degradation of oxalate to carbonate and 
hydrogen peroxide. Germins are highly similar to: - Germin-like proteins from varbus plants such as rape, violet or 
white mustard. - Slime mold spherulins 1a and 1 b which are proteins that accumulate specifically during spherulation, 
a process induced by various forms of environmental stress which leads to encystment and dormancy As a signature 
pattern the best conserved region was selected: a decapeptide located in the central section of these proteins. 
[0692] Consensus pattern: G-x(4)-H-x-H-P-x-A-x-E-[LIVM]- 
[0693] [1] Lane B.G. FASEB J. 8:294-301(1994). 
[0694] 233. (GlutR) 
Glutamyl-tRNA reductase signature 

[0695] Delta-aminolevulinic acid (ALA) is the obligatory precursor for the synthesis of all tetrapyrroles including por- 
phyrin derivatives such as chlorophyll and heme. ALA can be synthesized via two different pathways: the Shemin (or 
C4) pathway which involves the single step condensation of succinyl-CoA and glycine and which is catalyzed by ALA 
synthase (EC 2.3.1.37) and via the CSpathway from the five-carbon skeleton of glutamate. The C5 pathway operates 
in the chloroplast of plants and algae, in cyanobacteria, in some eubacteria and in archaebacteria. 
[0696] The initial step in the C5 pathway is carried out by glutamyl-tRNA reductase (GluTR) [1] which catalyzes the 
NADP-dependent conversion of glutamate- tRNA(Glu) to glutamate-1-semialdehyde (GSA) with the concomitant re- 
lease of tRNA(Glu) which can then be recharged with glutamate by glutamyl-tRNA synthetase. 
[0697] GluTR is a protein of about 50 Kd (467 to 550 residues) which contains a few conserved region. The best 
conserved region is located in positions 99 to 122 in the sequence of known GluTR. This region seems important for 
the activity of the enzyme. We have developed a signature pattern from that conserved region. 
[0698] Consensus patternH-[LIVM]-x(2)-[LIVMHGSTAC](3)-[LIVMJ-[DEQ]-S-[LIVMAHLIVM](2HGF]-E-x-[EQR]- 
[I V)-[LIT]-[STAG]-Q-[LIVM]-(KR] Sequences known to belong to this class detected by the pattern ALL 
[0699] [1] Jahn D., Verkamp E., Soell D. Trends Biochem. Sci. 17:215-218(1992). 
[0700] 234. (Glycoprotease) 
Glycoprotease family signature (aka Peptidase_M22) 

[0701] Glycoprotease (GCP) (EC 3.4.24.57) [1 ], or o-syalogtycoprotein endopeptidase, is a metalloprotease secreted 
by Pasteurella haemolytica which specifically cleaves O-sialoglycoproteins such as grycophorin A. The sequence of 
GCP is highly similar to the following uncharacterized proteins: 

Escherichia coli hypothetical protein ygjD (ORF-X). 
Bacillus subtilis hypothetical protein ydiE. 
Mycobacterium leprae hypothetical protein U229E. 
Mycobacterium tuberculosis hypothetical protein M1CY78.10. 
Synechocystis strain PCC 6803 hypothetical protein slr0807. 
Methanococcus jannaschii hypothetical protein MJ1130. 
Haloarcula marismortui hypothetical protein in HSH 3* region. 
Yeast hypothetical protein YKR038c. 
Yeast hypothetical protein QR17. 

[0702] One of the conserved regions contains two conserved histidines. It is possible that this region is involved in 
coordinating a metal ion such as zinc. 
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Chem. 266:10429-10437(1991). 

[ 7] Forchammer K., LeinfekJr W., Bock A. Nature 342:453-456(1989) 
8 Manavathu E.K., Hiralsuka K, Taylor D.E. Gene 62:17-26(1988) 

101 C fi ^t D ' J p l ll I ,tmaS B M - Smith C J - W?r FC - J - 170:3618-3626(1988) 

[12] Moller W., Schipper A., Amons R. Biochimie 69:983-989(1987). 
[0681] 228. GTP cyctohydrolase II 

ffr fifSt commi,ted ste P in ,he Synthesis of riboflavin 

iSSJJST" 1 ^ Ktonm ' hr G " V ° lk ^ KOh " te * ""P"* F " « ■ Bacher A, J Bacter* 1993; 

S^f^K^K^'^'^^^^'^^s^^MGalP UDP transf) 

C^sensus pattern: F-E-N W -x(3)-G-x(4)-H-P-H-x-Q (The two H's are the active site residuesl- 

[0684] Consensus pattern: D-L-P-l-V-G-G-[ST]-{LIVMK2)-fSAl-H-rDENl-H IFYl-O-r V m , . 

struc.ura.ly related to the HIT family of proteins (see <pgj ' ' t^^^ Note: ctass -' are 

[ 1] Reehardt J.K.V., Berg P. Nucleic Acids Res. 16:9017-9026(1988) 
[ 2] Mollet B., Pilloud N. J. Bacterid. 173:4464-4473(1991). 

[0685] 230. Gamma-thionins family signature 

[0686] The following small plant proteins are evolutionary related: 

• A flower-specific thionin (FST) from tobacco [2] 

- ™^ 

- Inhibitors of insect alpha-amytases from sorghum [4] 

- Probable protease inhibitor P322 from potato. 
" A gemination-related protein from cowpea [5] 

- Soybean sulfur-rich protein SE60 [7]. 

- Vicia faba antibacterial peptides fabatin-1 and -2. 

H h|+ 

xx CxxxxxxxxxxCxxxxxCxxxCxxxxxxxxxCxxxxxxCxCxxxC * • , 

| +H + |+ _ + I II 

'C: conserved cysteine involved in a disulfide bond. 
"': position of the pattern. 

[0688] Consensus pattern: IKRG h x-C-x,3HSV 1 -x(2)- [ FYWH ] -x- [ GF 1 -x-C-x(5)-C-x(3)-C [The four C's are invoked in 
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binding proteins alpha subunits (Gi f Gs. Gt, GO, etc.). - DNA mismatch repair proteins mutS family (See 
<PDOC00388 >). - Bacterial type II secretion system protein E (see <PDOC00567>).Not all ATP- or GTP-binding pro- 
teins are picked-up by this motif. A number of proteins escape detection because the structure of their ATP-binding 
site is completely different from that of the P-toop. Examples of. such proteins are the E1 -E2 ATPases or the glycolytic 
kinases. In other ATP- or GTP-binding proteins the flexible loop exists in a slightly different form; this is the case for 
tubulins or protein kinases. A special mention must be reserved for adenylate kinase, in which there is a single deviation 
from the P-loop pattern: in the last position Gly is found instead of Ser or Thr. 

Consensus pattern: [AG]-x(4)-G-K-[ST> 



[ 1) Walker J.E., Saraste M., Runswick M.J., Gay N.J. EMBO J. 1:945-951(1982). 
[ 2] Moller W., Amons R. FEBS Lett. 186:1-7(1985). 

[ 3] Fry D.C., Kuby S.A., Mildvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 
[ 4] Dever T.E., Glynias M.J., Merrick W.C. Proc. Natl. Acad. Sci. U.S. A 84:1814-1818(1987). 
[ 5] Saraste M., Sibbald PR., Wittinghofer A. Trends Biochem. Sci. 15:430-434(1990). 
[ 6) Koonin E.V. J, Mot. Biol. 229:1165-1174(1993). 
[7] Higgins C.F., Hyde S.C., Mimmack M.M., Gileadi U., Gill D.R., Gallagher M.P. J. Bioenerg Biomembr 22" 
571-592(1990). 

[ 8] Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 
20 [ 9] Under P., Lasko P., Ashburner M., Leroy P. ( Nielsen PJ., Nishi K., Schnier J., Slonimski PP Nature 337121 -122 

(1989). 

[10] Gorbalenya A.E., Koonin E.V., Donchenko A.P., Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 



[0679] GTP-binding elongation factors signature (GTP_EFTU2) 

Elongation factors [1 ,2) are proteins catalyzing the elongation of peptide chains in protein biosynthesis. In both prokary- 
otes and eukaryotes, there are three distinct types of elongation factors, as described in the following table: 

Eukaryotes Prokaryotes Function 

EF-1 alpha EF-Tu Binds GTP and an aminoacyl-tRNA; deliv- 
ers the latter to the A site of ribosomes. EF-1beta EF-Ts Interacts with EF-1a/EF-Tu to displace GDP and thus allows 
the regeneration of GTP-EF-1a. EF-2 EF-G Binds GTP and peptidyMRNA and translocates the latter from the A site 
to the P site. -The GTP-binding elongation factor family also 

includes the following proteins: - Eukaryotic peptide chain release factor GTP-binding subunhs [3]. These proteins 
interact with release factors that bind to ribosomes that have encountered a stop codon at their decoding site and help 
them to induce release of the nascent polypeptide. The yeast protein was known as SUP2 (and also as SUP35, SUF1 2 
or GST1) and the human homolog as GST1-Hs. - Prokaryotic peptide chain release factor 3 (RF-3) (gene prfC). RF- 
3 is a ctass-ll RF, a GTP-binding protein that interacts with class I RFs (see <PDOC00607 >) and enhance their activity 
[4]. - Prokaryotic GTP-binding protein lepA and its homolog in yeast (gene GUF1) and in Caenorhabditis elegans 
(ZK1 236.1). - Yeast HBS1 [5]. - Rat statin S1 [6], a protein of unknown function which is highly similar to EF-1 alpha. - 
Prokaryotic selenocysteine-specific elongation factor selB [7], which seems to replace EF-Tu for the insertion of se- 
lenocysteine directed by the UGA codon. - The tetracycline resistance proteins tetM/tetO [8,9] from various bacteria 
such as Campylobacter jejuni, Enterococcus f aecalis, Streptococcus mutans and Ureaplasma urealyticum. Tetracycline 
binds to the prokaryotic ribosomal 30S subunit and inhibits binding of aminoacyl-tRNAs. These proteins abolish the 
inhibitory effect of tetracycline on protein synthesis. - Rhizobium nodulation protein nodQ [10]. - Escherichia coli hy- 
pothetical protein ythK [11]. In EF-1-alpha, a specific region has been shown [12] to be involved in a conformational 
change mediated by the hydrolysis of GTP to GDP. This region is conserved in both EF-lalpha/EF-Tu as well as EF- 
2/EF-G and thus seems typical for GTP-dependent proteins which bind non-initiator tRNAs to the ribosome. The pattern 
developed for this family of proteins include that conserved region. 

[0680] Consensus pattern: D-[KRSTGANQFYW]-x(3)-E-[KRAQ]-x-[RKQD]-[GC]-[IVMK]-[ST]- [IV]-x(2)-[GSTACK- 

[ 1] Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New- York (1988). 
[ 2] Moldave K. Annu. Rev. Biochem. 54:1109-1149(1985). 

[ 3] Stansfield I., Jones K.M., Kushnirov V.V., Dagkesamanskaya A.R.. Poznyakovski A.I., Paushkin S.V., Nierras 
C.R.. Cox B.S., Ter-Avanesyan M.D., Tuite M.F. EMBO J. 14:4365-4373(1995). 

[ 4] Grentzmann G., Brechemier-Baey D., Heurgue-Hamard V., Buckingham R.H. J. Biol. Chem 270' 10595- 10600 
(1995). ~ s : 

[ 5] Nelson R.J., Ziegelhoffer T, Nicolet C, Werner-Washbume M., Craig E.A. Cell 71:97-105(1992) . 

[ 6] Ann D.K., Moutsatsos I.K., Nakamura T, Lin H.H., Mao P.-L, Lee M.-J., Chin S. ( Liem R.K.H., Wang E. J. Biol. 
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the significance of this relationship is not yet clear. Selenium, in the form of selenocysteine [7] is part of the catalytic 

site of GSHPx. The sequence around the selenocysteine residue is moderately well conserved in GSHWs and the 

related proteins and can be used as a signature pattern. As a second signature for this family of proteins a hiqhlv 

conserved octapeptide located in the central section of these proteins was selected 

[0672] Consensus pattern: [GNHRKHNITOhx^W^^ 

site selenocysteine residue) 

Consensus pattern: [LIV]-[AGD]-F-P-[CSJ-[NG]-Q- 

[ 1] Mannervik B. Meth. EnzymoL 113:490-495(1985). 

[ 2] Mullenbach G.T, Tabriz! A., Irvine B.D., Bell G.L, Taher J.A., Hallewell R.A. Protein Eng 2 239-246(1988) 
( 3] Chu F.F., Doroshow J.H., Esworthy R.S. J. Biol. Chem. 268:2571-2576(1993). 

[ 4] Takahashi K., Akasaka M., Yamamoto Y., Kobayashi C, Mizoguchi J., Koyama J. J. Biochera 108:145-148 
(1990). 

[ 5] Dunn D.K.. Howells D.D.. Richardson J.. Goldfarb P.S. Nucleic Acids Res. 17:6390-6390(1989) 
{ 6] Cookson E.. Blaxter M.L., Selkirk M.E. Proc. Natl. Acad. Sci. U.S.A. 89:5837-5841(1992). 
[ 7] Stadtman T.C. Annu. Rev. Biochem. 59:111-127(1990). 

[0673] 225. (GST) 
Glutathione S-transferases 

[067 ^L P^nction: conjugation of reduced glutathione to a variety of targets. Also included in the alignment, but are 
not GSTs S-crystall.ns from squid. Similarity to GST was previously noted. Eukaryotic elongation factors 1 -gamma 
Not known to .have GST activity; similarity not previously recognized. Supported by HMM and manual alignment in- 
spection. HSP26 family of stress-related proteins, including auxin-regulated proteins in plants and stringent starvation 
proteins m E. coll. Not known to have GST activity. Similarity not previously recognized. Supported by HMM and manual 
alignment inspection. Alignment spans entire protein. 
[0675] 226. GTP1/OBG family signature 

A widespread family of GTP-binding proteins has been recently characterized [1.2]. This family currently includes - 
Mouse and Xenopus protein DRG. - Human protein DRG2. - Drosophila protein 128up. - Fission yeast protein qtpl - 
A Halobacterium cutirubrum hypothetical protein in a rtoosomal protein gene cluster. - Bacillus subtilis protein obg 
Obg has been experimentally shown to bind GTP. - Escherichia coli hypothetical protein yhbZ. - Haemophilus influenzae 
5K5^Kf n v HI0 ^' -^P 188 ™ S enita,ium hypothetical protein MG384. - Yeast hypothetical protein 
YAL036C (FUN 11). - Yeast hypothetical protein YGR173W. - Caenorhabditis elegans hypothetical protein C02F5 3 The 
function of the proteins that belong to this family is not yet known. They are polypeptides of about 40 to 48 Kd which 
contain the frve small sequence elements characteristic of GTP-binding proteins [3]. As a signature pattern the region 
that correspond to the ATP/GTP B motif (also called G-3 inGTP-binding proteins) was selected 
[0676] Consensus pattern: D-[LIVM]-P-G-[UVM](2)-{DEY]-[GN]-A-x(2)-G-x-G - 

J 1] Sazuka T . Tomooka Y., Ikawa Y, Noda M., Kumar S. Biochem. Biophys. Res. Commun. 189.363-370(1992) 

[ 2] Hudson J.D.. Young P.G. Gene 125:191-193(1993). 

[ 3] Bourne H.R., Sanders D.A.. McCormick F. Nature 349:117-127(1991). 

[0677] 227. (GTP_EFTU1) 
ATP/GTP-binding site motif A (P-loop) 

[0678] From sequence comparisons and crystallographic data anaysis it has been shown [1,2,3 4 5 61 that an ao- 
precrable proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs 
The best conserved of these motifs is a glycine-rich region, which typically forms a flexible loop between a beta-strand 
and an alpha-helrx. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is 
generally referred to as the TV consensus sequence [1] or the P-loop' [5] There are numerous ATP- or GTP-bindina 
protems m which the P-loop is found. Listed below are a number of protein families for whteh the relevance of the 
presence of such motif has been noted: - ATP synthase alpha and beta subunits (see <PDOC00137>) - Myosin heavy 

fs^Pr^^T ? ^ , a ? d t k . inesin ^ ike P fo,eins < see <PDOC00343». - Dynamins and dynamin-like proteins 
(see <PDQC00362 >). - Guanylate kinase (see <PDOC00670 >1. - Thymidine kinase (see <PDOC00524 >1 - Thvmi- 
dyi a te k l n aS e(s^ <PP0C01034 >). - Shikimate kinase (see <PDOC00868 ». - Nitrogenase iron protein family (nifH/ 
frxC) (see <PDOC005B0>). - ATP-binding proteins invorved in "active transport" (ABC transporters) (71 (see 
<PDOC00185 >). - DNA and RNA helicases [8,9,10]. - GTP-binding elongation factors (EF-Tu. EF-1 alpha EF-G EF- 
pSo^o » am «y 0, . GTP ^"ding proteins (Ras. Rho. Rab, Ral. Ypt1. SEC4, etc.). - Nuclear protein ran '(see 
<PDOC00859 >). - ADP-nbosylaton factors family (see <PDOC00781>). - Bacterial dnaA protein (see <PDOC00771 >) 
- Bacterial recA protein (see <PDOC00131>). - Bacterial recF protein (see <PDOCO0539 >V - Gua nine nucleotid e- 
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similarities two classes of GATase domains have been identified [2,3]: class-! (also known as trpG-type) and class-ll 
(also known as purF-type). Class-I GATase domains have been found in the following enzymes: 

- The second component of anthranilate synthase (AS) (EC 4. 1 .3.27) [4]. AS catalyzes the biosynthesis of anthra- 
nilate from chorismate and gtutamine. AS is generally a dimeric enzyme: the first component can synthesize an- 
thranilate using ammonia rather than glutamine, whereas component II provides the GATase activity. In some 
bacteria and in fungi the GATase component of AS is part of a multifunctional protein that also catalyzes other 
steps of the biosynthesis of tryptophan. 

- The second component of 4-amino-4-deoxychorismate (ADC) synthase (EC 4. 1 .3. -), a dimeric prokaryotic enzyme 
that function in the pathway that catalyzes the biosynthesis of para-aminobenzoate (PABA) from chorismate and 
glutamine. The second component (gene pabA) provides the GATase activity [4]. 

CTP synthase (EC 6.3.4.2). CTP synthase catalyzes the final reaction in the biosynthesis of pyrimidine, the ATP- 
dependent formation of CTP from UTP and glutamine. CTP synthase is a single chain enzyme that contains two 
distinct domains; the GATase domain is in the C-terminal section [2]. 

- GMP synthase (glutamine-hydrolyzing) (EC 6.3.5.2). GMP synthase catalyzes the ATP-dependent formation of 
GMP from xanthosine S'-phosphate and glutamine. GMP synthase is a single chain enzyme that contains two 
distinct domains; the GATase domain is in the N-terminal section [5]. 

- Glutamine-dependent carbamoyl-phosphate synthase (EC 6.3.5.5) (GD-CPSase); an enzyme involved in both 
arginine and pyrimidine biosynthesis and which catalyzes the ATP-dependent formation of carbamoyl phosphate 
from glutamine and carbon dioxide. In bacteria GD-CPSase is composed of two subunits: the large chain (gene 
carB) provides the CPSase activity, while the small chain (gene carA) provides the GATase activity. In yeast the 
enzyme involved in arginine biosynthesis is also composed of two subunits: CPA1 (GATase). and CPA2 (CPSase). 
In most eukaryotes, the first three steps of pyrimidine biosynthesis are catalyzed by a large multifunctional enzyme 
(called URA2 in yeast, rudimentary in Drosophila, and CAD in mammals). The GATase domain is located at the 
N-terminal extremity of this polyprotein [6]. 

- Phosphoribosylformylglycinamidine synthase II (EC 6.3.5.3), an enzyme that catalyzes the fourth step in the de 
novo biosynthesis of purines. In some species of bacteria, FGAM synthase II is composed of two subunits: a small 
chain (gene purQ) which provides the GATase activity and a large chain (gene purL) which provides the aminator 
activity. 

- The histidine amidotransferase hisH, an enzyme that catalyzes the fifth step in the biosynthesis of histidine in 
prokaryotes. 

[0668] In the second component of AS a cysteine has been shown [7] to be essential for the amidotransferase activity. 
The sequence around this residue is well conserved in alt the above GATase domains and can be used as a signature 
pattern for class-l GATase. 

[0669] Consensus pattern[PASHLIVMFYT]-[LIVMFYl-G-[UVMFY]-C-[LIVMFYN]-G-x.[QEHI- x-[LIVMFA] [C is the 
active site residue] Sequences known to belong to this class detected by the pattern ALL, except for 6 sequences. 
[0670] Note: in the first position of the pattern Pro is found in all cases except in the slime mold GD-CPSase where 
it is replaced by Ala. 

[ 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

[ 2] Weng M., Zalkin H. J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4] Crawford LP. Annu. Rev. Microbiol. 43:567-600(1989). 

[ 5] Zalkin H., Argos P., Narayana S.V.L., Tiedeman A.A., Smith J.M. J. Biol. Chem. 260:3350-3354(1985). 
[ 6] Davidson J.N., Chen K.C., Jamison R.S., Musmanno L.A., Kern C.B. BioEssays 15:157-164(1993). 
[ 7] Tso J.Y., Hermodson M.A., Zalkin H. J. Biol. Chem. 255:1451-1457(1980). 

[0671] 224. Glutathione peroxidases signatures (GSHPx) 

Glutathione peroxidase (EC 1.11.1.9 1 (GSHPx) [1.2] is an enzyme that catalyzes the reduction of hydroperoxides 
by glutathione. Its main function is to protect against the damaging effect of endogenously formed hydroxyperoxides. 
In higher vertebrates at least four forms of GSHPx are known to exist a ubiquitous cytosolic form (GSHPx-1), a gas- 
trointestinal cytosolic for (GSHPx-GI) [3], a plasma secreted form (GSHPx-P) [4], and a epididymal secretory form 
(GSHPx-EP). In addition to these characterized forms, the sequence of a protein of unknown function [5] has been 
shown to be evolutionary related to those of GSHPx's. In filarial nematode parasites such as Brugia pahangi the major 
soluble cuticular protein, known as gp29, is a secreted GSHPx which could provide a mechanism ot resistance to the 
immune reaction of the mammalian host by neutralizing the products of the oxidative burst of leukocytes [6].Escherichia 
coli protein btuE, a periplasmic protein involved in the transport of vitamin B12, is also evolutionary related to GSHPx's; 
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' G '^ te deh y dr °9 enases < EC 1 4 1 2 - EC 1.4.1.3, and EC 1.4.1.4) (GluDH) are enzymes that catalyze the NAD- 
or NADP-dependent reversible deamination of glutamate into alpha-ketoglutarate [1 ,2]. GluDH isozymes are Gen- 
erally involved with either ammonia assimilation or glutamate catabolism 

- Leucine dehydrogenase (EC 1.4.1.9) (LeuDH) is a NAD-dependent enzyme that catalyzes the reversible deami- 
nation of leucine and several other aliphatic amino acids to their keto analogues [3] 

- Phenylalanine dehydrogenase (EC 1.4.1 .20) (PheDH) is a NAD-dependent enzyme that catalyzes the reversible 
deamidation of L-phenylalanine into phenylpyruvate [4). 

- Valine dehydrogenase (EC 1.4.1.8) (VaIDH) is a NADP-dependent enzyme that catalyzes the reversible deami- 
dation of L-valine into 3-methyl-2-oxobutanoate |5J. 

[0661] These dehydrogenases are structurally and functionally related. A conserved lysine residue located in a ojy- 

T 0 " T f " imp,iCa,ed in ,hG Cam * mechanism - Th ° conservation of the region around this residue 
allows the derivation of a signature pattern tor such type of enzymes 

[0662] Cc^sensus pattemlU V>x(2H3-G-[SAG]-K-x-[GVJ-x(3HDNS-r>[PL] [K is the active site residue] Sequences 
known to belong to this class detected by the pattern ALL 

I0S63 l, ™ e !" T OWn sequences ,rom ,his ,ami,v hav8 p f° »"» position of the pattern with the exceotion of 
yeasi GluDH which as Leu. 

[ 1J Britton K.L. Baker P.J., Rice D.W., Stillman T.J. Eur. J. Biochem. 209:851-859(1992). 
1 2) Benachenhou-Lahfa N., Forterre P., Labedan B. J. Mol. Evol. 36:335-346(1993) 

(lIsS) 9313 S " TaniZaWa K ' ES3ki N " Sakam0, ° Y ' ° hshima T - Tanaka H - K Biochemistry 27:9056-9062 

& [ 4J Takada H. , Yoshimura T. , Ohshima T., Esaki N., Soda K. J. Biochem. 109:371 -376(1 991 ) 

[ 5J Hutchinson C.R., Tang l_ J. Bacteriol 175:4176-4185(1993). 

[0664] 222. GMC oxidoreductases signatures 

The following FAD flavoproteins oxidoreductases have been found [1.2] to be evolutionary related These enzymes 

- ,m™7 T yZ ^ 9 ^ + OXV9en * dete -9"«»nolactone + hydrogen peroxide. - Methanol oxidase (EC 11.3131 
Se IeC T, «1 wS^w ^ me,h o an0, + OXV9en aC6taldehvde + Peroxide. - Choline d^ 

IZZ h :LL T E1) n, ] baC,e " a ReaCtIOn choline + unknown acceptor -> betaine acetaldehyde 

+ reduced acceptor. - Glucose dehydrogenase (GLD) (EC 1.1.99.10 ) from Drosophila. Reaction catalyzed- glucose ! 
unknown acceptor -> delta-gluconolactone + reduced acceptor. - Cholesterol oxidase (CHOD) (EC 1136) from Brevi- 
bacterium steroUcum and Streptomyces strain SA-COO. Reaction catafyzed: cholesterol + ^geTTttho^tren- 
*Zt + ^f. 090 " ?'T e - ' A ' W t31 an a ' COh01 de "VC«rogenase from Pseudomonas o.eovorans. which converts 

^tltrnt *T* T aWehydeS - ™ S ,amily alS ° inC ' UdeS 3 * aSe: - (R)— "delonitrile 
ul*£S { * T ty3Se) ,r0m PlantS [41, *" en2yme involved in c y a "°9enis, the release of hydrogen cyanide 

SThafe ^ Pr ° teinS °' Si2e ran9in9 ^ 556 (CHD) ,0 664 (MOX > "*» 

which share a number of regions of sequence similarities. One of these regions, located in the terminal section 

corresponds to the FAD ADP-binding domain. The f unctton of the other conserved domains is not ye.Zwn S 
these domains were selected as signature patterns. Thefirst one is located in the N-terminal section of these enzymes 
about 50 rescues after the ADP-binding domain, while the second one is located in the central secZ 
[0665] Consensus pattern: [GA]-[RKNhx-[Llv>G(2)-[GST](2)-x-[LIVM]-N- X (3HFYWA]- x(2)-[PAG]-x(5)-{DNESHl- 
Consensusr«ttem:[GSHPSTAhx(2HST]^ K)l J x ^lUNhfaHj 

[ 1J Cavener D R. J. Mol. Biol. 223:811-814(1992). 
[ 2J Henikoff S., Henikoff J.G. Genomics 19:97-107(1994). 

[ 3] van Beilen J.B., Eggink G„ Enequist H.. Bos R.. Witholt B. Mol. Microbiol. 6 3121-3136(1992) 
I 4) Cheng I P, Poulton J.E. Plant Cell Physiol. 34:1139-1143(1993). 

[0666] 223. (GMP_synt_C) 
Glutamine amidotransferases class-l active site 

S> f T f T 0< DiOSynthe,ic enz y mes are able to ^'V^ the removal of the ammonia group trom glutamine 
and then to transfer th,s group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known as 

s 9 buoTo? IT (GA I aS6) <EC 2420 I1J - ^ GATaSe d0mai " existe cilh " as a separa^^idic 
subunrt or as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence 
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pattern for class-l GATase. - 

[0648] Consensus pattern: [PASHLIWFYTHLI\MFY>G^UW^ x-[LIVMFA] (C is the 

active site residue]- 

[ 1] Buchanan J.M. AoV. EnzymoL 39:91-183(1973). 

[ 2] Weng M., Zalkin H. J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyunoya H.. Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4J Crawford LP. Annu. Rev. Microbiol. 43:567-600(1989). 

[ 5) Zalkin K, Argos P. t Narayana S.V.L, Tledeman AA, Smith J.M. J. Biol. Chem. 260:3350-3354(1985). 
{ 6J Davidson J.N., Chen K.C., Jamison R.S., Musmanno L.A., Kern C.B. BioEssays 15:157-164(1993). 
[ 7] Tso J.Y., Hermodson M.A.. Zalkin H. J. Biol. Chem. 255:1451-1457(1980). 

[0649] 216. Glutamine amidotransferases class-ll active site (GATase_2) 

A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group from glutamine and then 
to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known as glutamine 
amidotransf erase (GATase) (EC 2.4.2.-) [1]. The GATase domain exists either as a separate polypeptide subunit or 
as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence similarities two 
classes of GATase domains have been identified [2,3]: class-l(also known as trpG-type) and class-ll (also known as 
purF-type). Class-ll GATase domains have been found in the following enzymes: - Amido phosphoribosyhransferase 
(glutamine phosphoribosylpyrophosphate amidotransferase) (EC 2.4.2.14) . An enzyme which catalyzes the first step 
in purine biosynthesis, the transfer of the ammonia group of glutamine to PRPP to form 5-phosphoribosylamine (gene 
purF in bacteria, ADE4 in yeast). - Glucosamine-fructose-6-phosphate aminotransferase (EC 2.6.1.16 V This enzyme 
catalyzes a key reaction in amino sugar synthesis, the formation of glucosamine 6-phosphate from fructose 6-phos- 
phate and glutamine (gene glmS in Escherichia coli, nodM in Rhizobium, GFA1 in yeast) - Asparagine synthetase 
(glutamine-hydrolyzing) (EC 6.3.5.4V This enzyme is responsible for the synthesis of asparagine from aspartate and 
glutamine. A cysteine is present at the N-terminal extremity of the mature form of all these enzymes. The cysteine has 
been shown, in amido phosphoribosyltransf erase [4] and in asparagine synthetase [5] to be important for the catalytic 
mechanism. 

[0650] Consensus pattern: <x(0,11 )-C-[GS]-[IV]-[LIVMFYW]-[AG] [C is the active site residue]- 

[ 1J Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

[ 2] Weng M., Zalkin K J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4] van Heeke G.. Schuster M. J. Biol. Chem. 264:5503-5509(1989). 

[ 5] Vollmer S.J., Switzer R.L, Hermodson M.A., Bower S.G.. Zalkin H. J. Biol. Chem. 258:10582-10585(1983). 
[0651] 217. GDP dissociation inhibitor (GDI) 

[0652] [1] Schalk I, Zeng K, Wu SK, Stura EA. Matteson J, Huang M, Tandon A, Wilson IA, Balch WE Nature 1996 
381:42-48. 

[0653] 218. Oxidoreductase family (GFOJDH_MocA) 

[0654] This family of enzymes utilise NADP or NAD. This family: is called the GFCVIDH/MOCA family in swiss-prot. 
[0655] [1] Kingston RL, Scopes RK, Baker EN. Structure 1996;4:1413-1428. 
[0656] 219. GHMP kinases putative ATP-binding domain 

The folbwing kinases contains, in their N-terminal section, a conserved Gly/Ser-rich region which is probably involved 
in the binding of ATP [1]. These kinases are listed below. - Galactokinase (EC 2.7.1.6 V - Homoserine kinase (EC 
2-7.1.39). - Mevabnate kinase (EC 2.7.1.36 ). - Phosphomevalonate kinase (EC 2.7.4.2 ). This group of kinases was 
called 'GHMP' (from the first letter of their substrate) 

Consensus pattern: [LIVM]-[PK]-x-[GSTA]-x(0,1 )-G-L-[GS]-S-S-[GSA]-[GSTAC]- 
[0657] [ 1] Tsay Y.H., Robinson G.W. Mol. Cell. Biol. 11:620-631(1991). 
[0658] 220. Glucose inhibited division protein A family signatures (GIDA) 

Bacterial glucose inhibited division protein A (gene gidA) is a protein of 70Kd whose function is not yet known and 
whose sequence is highly conserved. It is evolutionary related to yeast hypothetical protein YGL236C, Caenorhabditis 
elegans hypothetical protein F52H3.2 and a Bacillus subtilis protein called gid (and which is different from B.subtilis 
gidA). Two highly conserved regions were selected as signature patterns. Both regions are located in the central region 
of the protein. 

[0659] Consensus pattern: (GS]-[PT]-x-Y-C-P-S-ILIVM]-E-x-K-[LIVM]-x-|KR]- 

Consensus pattern: A-G-Q-x-[NT]-G-x(2)-G-Y-x-E-{SAG](3)-[QS)-G-[LIVM](2)-A-G-[LIVMT|-N-A- 

[0660] 221. (GLFV_dehydrog) 
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of genes containing the GATA region, including vitellogenin genes [6J. - Ustilago maydis urbsl [7], a protein involved 
in the repression of the biosynthesis of siderophores. - Fission yeast protein GAF2.AII these transcription factors contain 
a pair of highly similar *zinc finger* type domains with the consensus sequence C*2-C-x1 7-C-X2-C Some other proteins 
contain a single zinc finger motif highly related to those of the GATA transcription factors. These proteins are* - Dro- 
sophila box A-bmdmg factor (ABF) (also known as protein serpent (gene srp)) which may function as a transcriptional 
activator protein and may play a key role in the organogenesis of the fat body. - Emencella nidulans areA [8] a tran- 
scriptional activator which mediates nitrogen metabolite repression. - Neurospora crassa nrt-2 [9] a transcriptional 
activator which turns on the expression of genes coding for enzymes required for the use of a variety of secondary 
nitrogen sources, dunng conditions of nitrogen limitation. - Neurospora crassa white collar proteins 1 and 2 (WC-1 and 
WC-2), which control expression of light-regulated genes. - Saccharomyces cerevisiae DAL81 (or UGA43) a negative 
nitrogen regulatory protein. - Saccharomyces cerevisiae GLN3, a positive nitrogen regulatory protein. - Saccharomyces 
cerevisiae GAT1 . - Saccharomyces cerevisiae G2F3. 

[0646] Consensus pattern: C-x-[DN]-C-x(4,5)-[STl-x(2)-W-[HRJ-[RK]-x(3)-{GN]-x(3 t 4)- C-N-[AS]-C [The four C's are 
zinc ligands] 

[ 1] Trainor CD., Evans T., Felsenfeld G., Boguski M.S. Nature 343:92-96(1990). 

[ 2) Lee M.E., Temizer DT., Clifford J.A, Quertermous T. J. Biol. Chem. 266:16188-16192(1991) 

[ 3] Ho l.-C. t Vorhees P., Marin N.. Oakley B.K., Tsai S.-R, Orkin S.H., Leiden J.M. EMBO J. 10:1187-1192(1991) 

[ 4) Spieth J., Shim Y.H., Lea K., Conrad R., Blumenthal T. Mol. Cell. Biol. 11:4651-4659(1991) 

[ 5] Drevet J.R, Skeiky Y.A., latrou K. J. Biol. Chem. 269:10660-10667(1994). 

[ 6] Hawkins M.G., McGhee J.D. J. Biol. Chem. 270:14666-1467109951 

[ 7] Voisard C.P.O., Wang J., Xu P., Leong S.A., McEvoy J.L Mol. Cell. Biol. 13:7091-7100(1993) 

[ 8] Arst H.N. Jr., Kudla B., Martinez-Rossi N.M., Caddick M.X., Sibley S., Davies R.W. Trends Genet 5:291-291 

(1 989). 

[ 9] Fu Y.-H., Marzluf G.A MoL Cell. Bid. 10:1056-1065(1990). 
[0647] 215. Glutamine amidotransf erases class-l active site (GATase) 

A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group from glutamine and then 
to transfer th.s group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known asglutamine 
amidotransferase (GATase) (EC 2.4.2.-) [1]. The GATase domain exists either as a separate polypeptide subunit or 
as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence similarities two 
classes of GATase domains have been identified [2,3]: class-l(also known as trpG-type) and class-ll (also known as 
purF-type) Class-l GATase domains have been found in the following enzymes: - The second component of anthra- 
n.bte synthase (AS) (EC 4+327) [4 |. AS catalyzes the biosynthesis of anthranilate from chorismatVand glutamine 
AS is generally a dimeric enzyme: the first component can synthesize anthranilate using ammonia rather than 
g utam.ne, whereas component II provides the GATase activity. In some bacteria and in fungi the GATase component 
of AS is part of a multifunctional protein that also catalyzes other steps of the biosynthesis of tryptophan - The second 
component of 4-amino^-deoxychorismate (ADC) synthase (EC 4.1.3. -). a dimeric prokaryotic enzyme that function 
in the pathway that catalyzes the biosynthesis of para-aminobenzoate (PABA) from chorismate and glutamine The 
second component (gene pabA) provides the GATase activity [4]. - CTP synthase (EC 6.3.4.2 1 CTP synthase catalyzes 
the final reaction in the biosynthesis of pyrimidine, the ATP-dependent formation of CTP from UTP and glutamine CTP 

Z " ™d S 3 T 9 ' 6 Ch8h enZyme ** C ° n,ainS ,W0 dis,incl domah * ,he GATaso is ^ the C-terminal section 
S,d? Sy (^"^^^yrtolyzing) (EC 6^). GMP synthase catalyzes the ATP-dependent formation of 
GMP f rc>m xamhosine 5 -phosphate and glutamine. GMP synthase is a single chain enzyme that contains two distinct 

fZk^SZ T** 3 "?" fe " the N " ,erminal sec1ion & ' G'^mineKJependent carbamoyl-phosphate synthase 
(EC^S) (GD-CPSase); an enzyme involved in both arginine and pyrimidine biosynthesis and which catalyzes the 
ATP-dependent formation of carbamoyl phosphate from glutamine and carbon dioxide. In bacteria GD-CPSase is com- 
posed of two subunits: the large chain (gene carB) provides the CPSase activity, while the small chain (gene carA) 

rZ^r ?-? G f TaSe yeaSt ,he enZyme Med in ar 9 inine bios y"»hesis « also composed of two subunits' 

CPA (GATase), and CPA2 (CPSase). In most eukaryotes, the first three steps of pyrimidine biosynthesis are catalyzed 
by a large multifunctional enzyme (called URA2 in yeast, rudimentary in Drosophib, and CAD in mammals) The GA- 
Tase domain is located at the N-terminai extremity of this polyprotein [6] - 

Phosphoribosyltormylglycinamidine synthase II (EC 6.3.5.3 ). an enzyme that catalyzes the fourth step in the de novo 
b.osynthesB of purines. In some species of bacteria. FGAM synthase II is composed of two subunits- a small chain 
(gene purQ) which provides the GATase activity and a large chain (gene purL) which provides the aminator activity - 
The histKHne amidotransferase hisH. an enzyme that catalyzes the fifth step in the biosynthesis of histidine in prokarv- 
otes.ln the second component of AS a cysteine has been shown [7] to be essentialfor the amidotransferase activity. 
The sequence around this residue is well conserved in all the above GATase domains and can be used as a signature 
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members: - Mammalian tub. an hydrophilic protein of about 500 residues, which could be involved in the hypothalamic 
regulation of body weight. - Human protein TULP1 [3] which may be involved in retinis pigmentosa 14, a retinal de- 
generation disease. - Mouse protein p4-6 whose function is not known. - Caenorhabditis elegans hypothetical protein 
F10B5.4. - Several fragmentary sequences from plants, Drosophila and human ESTs. While the N-terminal part of 
these protein is not conserved in length nor in the sequence, the C-terminal 250 residues are highly conserved. There- 
fore, two regions were selected in the C-terminal part as signature patterns. The secondr egion is located at the C- 
terminal extremity and contains a penultimate cysteine residue that could be critical to the normal functioning of these 
proteins. 

Consensus pattern: F-[KHQ]-G-R-V-[ST]-x-A-S-V-K-N-F-a 

Consensus pattern: A-F4AG]-l-[SAC]-[UVM]-[ST]-S-F-x-{GST]-K-x-A-C-E 

[ 1] Kleyn R W., Fan W„ Kovats S.G., Lee J.L. Pulido J.C., Wu Y. Berkemeier LR, Misumi D.J., Holmgren L, Chariat 
O., Woolf E.A., Tayber O., Brody T, Shu P., Hawkins F, Kennedy B., Baldini L, Ebeling C. Alperin G.D., Deeds J., 
Lakey N.D., Culpepper J., Chen H. Gluecksmann-Kuis M.A., Carlson G.A., Duyk G.M., Moore K.J. Cell 85:281 -290 
(1996).[ 2| Noben-Trauth K., Naggert J.K.. North M.A., Nishina P.M. Nature 380:534-538(1 996).( 3] North M.A., Naggert 
J.K., Yan Y, Noben-Trauth K„ Nishina P.M. Proc. Natl. Acad. Sci. U.S.A. 94:3128-3133(1997). 
[1533] 651 . Eukaryotic DNA topoisomerase I active site 

DNA topoisomerase I (EC 5.99.1.2 ) [1,2,3 ( 4,E1J is one of the two types of enzyme that catalyze the interconversion 
of topological DNA isomers. Type Itopoisomerases act by catalyzing the transient breakage of DNA, one strand at a 
time, and the subsequent rejoining of the strands. When a eukaryotic type Itopoisomerase breaks a DNA backbone 
bond, it simultaneously forms a protein-DNA link where the hydroxy! group of a tyrosine residue is joined to a 3*- 
phosphate on DNA, at one end of the enzyme-severed DNA strand. In eukaryotes and pox virus topoisomerases I, 
there are a number of conserved residues in the region around the active site tyrosine. 
Consensus pattern: [DEN]-x(6)-[GS]-[IT]-S-K-x(2)-Y-[LI VM]-x(3)-[U VM] [Y is the active site tyrosine] 
[ 1] Sternglanz R Curr. Opin. Cell Biol. 1 :533-535(1990).{2] Sharma A., Mondragon A. Curt Opin. Struct. Biol. 5:39-47 
(1995).[3) Lynn R.M., Bjornsti M.-A., Caron RR, Wang J.C. Proc. Natl. Acad. Sci. U.S.A. 86:3559-3563(1 989). [ 4] 
Roca J. Trends Biochem. Sci. 20:156-1 60(1 995 ).[E1] 
[1534] 652. Transaldolase signatures 

Transaldolase (EC 2.2.1.2) catalyzes the reversible transfer of a three<arbonketol unit from sedoheptubse 7-phos- 
phate to glyceraldehyde 3-phosphate to form erythrose 4-phosphate and fructose 6-phosphate. This enzyme, together 
with transketolase. provides a link between the glycolytic and pentose^hosphate pathways. Transaldolase is an en- 
zyme of about 34 Kd whose sequence has been well conserved throughout evolution. A lysine has been implicated 
{1 Jin the catalytic mechanism of the enzyme; it acts as a nucleophilic group that attacks the carbonyl group of fructose- 
6-phosphate.Transaldoiase is evolutionary related [2] to a bacterial protein of about 20Kd (known as talC in Escherichia 
coli), whose exact function is not yet known. Two signature patterns have been developed for these proteins. The first, 
located in the N-terminal section, contains a perfectly conserved pentapeptide; these cond, includes the active site 
lysine. 

Consensus pattern: (DGJ-{lVSA]-T-lST]-N-P-fSTA]-fLIVMF](2) 
Consensus pattern: lLI\M]-x-[UVM]-K4LIVMHPAS]-x-^ 
[K is the active site residue] 

[ 1] Miosga T, Schaaff-Gerstenschlaeger I., Franken E., Zimmermann F.K. Yeast 9: 1241 -1249(1 993). [ 2] Reizer J., 

Reizer A., Saier M.H. Jr. Microbiology 141:961-971(1995). 

[1535] 653. (Transpeptidase) Penicillin binding protein transpeptidase domain 

[1536] The active site serine (residue 337 in Swiss:P 14677 ) is conserved in all members of this family. 

[1537] [1] Pares S, Mouz N, Petillot Y, Hakenbeck R, Dideberg O Nat Struct Biol 1996;3:284-289. 

[1538] 654. Trehalase signatures 

Trehalase (EC 3.2.1.28) is the enzyme responsible for the degradation of the disaccharide alpha, alpha-ttehalose 
yielding two glucose subunits [1]. It is an enzyme found in a wide variety of organisms and whose sequence has been 
highly conserved throughout evolution. Two of the most highly conserved regions have been selected as signature 
patterns. The first pattern is located in the central section, the second one is in the C-terminal region Consensus 
pattern: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y 
Consensus pattern: Q-W-D-x-P-x-[GA]-W-[PAS]-P 

[ 1] Kopp M., Mueller H., Holzer H. J. Biol. Chem. 268:4766-4774(1 993).[ 2] Henrissat B., Bairoch A. Biochem J 293' 
781-788(1993).[E1] 

[1539] 655. Trehalose-6-phosphate synthase domain 

[1 540] Ots A (Trehalose-6-phosphate synthase) is homologous to regions in the subunits of yeast trehalose-6-phos- 
phate synthase/phosphate complex, [1]. 

[1541] [1] Kaasen I, McDougall J, Strom AR; Gene 1994;145:9-15. 
[1542] 656. Tropomyosins signature 
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Pseudomonas denitrificans (cobA) or Methanobacterium ivanovii (gene corA) SUMT is a protein of about 25 to 30 Kd 
In Escherichia coli and related bacteria, the cysG protein, which is involved in the biosynthesis of siroheme is a mul- 
tifunctwnal protem composed of a N-terminal domain, probably involved in transforming precorrin-2 into siroheme and 
a C-termmal domain which has SUMT activity. The sequence of SUMT is related to that of a number of P. denitrificans 
and Salmonella typhimurium enzymes involved in the biosynthesis of cobalamin which also seem to be SAM-dependent 
methyltransferases [3.4]. The similarity is especially strong with two of these enzymes: cobl/cbiL which encodes S- 
adenosyl-L-methionine--precorrin-2 methyltransferase and cobM/cbiF whose exact function is not known Two skjna 
ture patterns have been developed for these enzymes. The first corresponds to a well conserved region in the N- 
terminal extremity (called region 1 in [1.3]) and the second to a less conserved region located in the central part of 
these proteins (this pattern spans what are called regions 2 and 3 in [1 .3]). 

Consensus pattern: [LIVM]-[GS]-[STALJ-G-P-G-x(3)-[LIVMFY]4UVM]-T-[UVMHKRHQGHAG] 
Consensus pattern: V-x(2HLIhx(2)-G-D-x(3HFYVv]-^^ 

lio^t^ R ? in m C ' C ° Uder M - FaUChef °- CaUCh ° iS L ' Cameron 8 • Cfouzet J J ****** 173:4637-4645 
(iasi).[ 2] Robin C, Blanche F., Cauchois L.. Cameron B.. Couder M.. Crouzet J. J. Bacteriol 1734893-4896(1991) 

I ,™«li, Camer0n 8 ' Cauchois L - Ri 9a"« Rouyez M.-C.. Blanche F., Thibaut D.. Debussche L J Bacterid 

SSSSHT^r' JG - Rubeflfie,d M - Ktaffer - Hi ^ s s ■ ™ <> 

3303-3316(1 993).[ 5] Mattheakis LC, Shen W.H.. Collier R.J. Mol. Cell. Biol. 12 4026-4037(19921 

[1528] 646. Tudor domain ' 

Domain of unknown function present in several RNA-binding proteins, copies in the Drosophila Tudor protein Slight 

ambiguities in the alignmentNumber of members: 18 

J' M ,f J'" 0 ' 9 I 20 ° S Q 6 J 1 I^I^ mains in P ro,eins ,hat interact RN A. Ponting CP; Trends Biochem Sci 1997;22- 
51-52. [2]Med ne: 97157029 The human EBNA-2 coactivator p100: multidomain organization and relationship to the 

SIL^I, " h 3 f!iol d r d t0 the tUd ° r Pr0tein Med m ^ roso P n '' a melanogaster development. Callebaut I 
Momon JP; Biochem J 1997;321:125-132. 

[1529] 647. Terpene synthase family 

It has been suggested that this gene family be designated tps (for terpene synthase) [1]. It has been split into six 
subgroups on the bas.s of phytogeny, called tpsa-tpsf. tpsa includes vetispiridiene synthase Swiss:Q39979 5-epi- 
anstolochene synthase, Swiss:Q40577 and (+)-delta-cadinene synthase Swiss:P93665. ' 
tpsb includes (-)-limonene synthase, Swiss:Q40322. 
tpsc includes kaurene synthase A, Swiss:004408. 
tpsd includes taxadiene synthase, Swiss:Q41594, pinene synthase, 
Swiss:024475 and myrcene synthase, Swiss:024474. 
tpse includes kaurene synthase B. 
tpsf includes linalool synthase. 
Number of members: 51 
[1] 

Medline: 97413772 

9fand * (AbiGS 9randiS) - CDNA iS0,ati0n * characteri ^« ^d functional expression of 
myrcene synthase, (-)-(4S)-limonene synthase, and (-)-{1 S,5S)-pinene synthase 
Bohlmann J, Steele CL, Croteau R; 
J Biol Chem 1997;272:21784-21792. 
[1530] 648. ThiF family 

IJS^ a repeated domain in ubiquitin activating enzyme E1 and members of the bacterial 
ThiF/MoeB/HesA family.Number of members: 87 
[1531] 649. Thioester dehydrase 

Members of this family are involved in fatty acid biosynthesis. 
Number of members: 1 9 
[1] 

Medline: 96398612 

Structure of a dehydratase-isomerase from the bacterial pathway for biosynthesis of unsaturated fatty acids- two cat- 
alytic activities in one active site. 7 
Leesong M, Henderson BS, Giilig JR, Schwab JM, Smith JL; 
Structure 1996;4:253-264. 
Database Reference: SCOP; 1mka; fa; [SCOP-USA] [CATH-PDBSUM] 
Database reference: PFAMB; PB058036; 
[1532] 650. Tub family signatures 

The mouse tubby mutation is the cause of maturityonset obesity, insulin resistance and sensory deficits This mutation 
maps to a gene, tub [1.2],which codes for a protein tha, betongs to a family which currently consists oMhe £££ 
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to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse in terms of subunit size and of quaternary structure. 
A few years ago it was found [2] that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particular the consensus tetrapeptide His-lle-GIy-His ('HIGH*) is very well conserved. The 'HIGH'region has 
been shown (3) to be part of the adenylate binding site. The 'HIGH* signature has been found in the aminoacyl-tRNA 
synthetases specific forarginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryp- 
tophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-l synthetases [4,5,6] and seem to 
share the same tertiary structure based on a Rossmann fold. Consensus pattern: P-x(0 ( 2)-[GSTAN]-[DENQGAPK]-x- 
[LIVMFP]-[HT]-[LIVMYAC]-G-fHNTG]-[LIVMFYSTAGPC 

[ 1] Schimmel P Annu. Rev. Bkxhem. 56:1 25-1 58(1 987).[ 2] Webster T„ Tsai H., Kula M., Mackie G.A., Schimmel P 

Science 226:1315-1317(1984).[ 3] Brick P., BhatT.N.. Blow D.M. J. Mol. Biol. 208:83-98(1 988). [4] Delarue M., Moras 

D. BioEssays 1 5:675-687(1 993).f 5] Schimmel P. Trends Biochem. Sci. 16:1-3(1991).[6] Nagel G.M., Ooolittle R F. 

Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 

[1 548] 661 . (tRNA-synt 1 C) tRNA synthetases class I (E and Q) 

[1 549] Other tRNA synthetase sub-families are too dissimilar to be included. 

This family includes only glutamyl and glutaminyl tRNA synthetases. 

In some organisms, a single glutamyl-tRNA synthetase aminoacylates both tRNA(Glu) and tRNA(Gln). 

[1550] [1] Rath VL, Silvian LF, Beijer B, Sproat BS, Steitz TA; Structure 1998;6:439-449. 

[1551] 662. (tRNA-synt 1d) tRNA synthetases class I (R) 

[1 552] Other tRNA synthetase sub-families are too dissimilar to be included. 

This family includes only arginyl tRNA synthetase. 

[1553] 663. Aminoacyl-transfer RNA synthetases class-ll signatures (tRNA synt 2) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse interms of subunit size and of quaternary structure. 
The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, 
and threonine are referred to as class-H synthetases [2 to 6] and probably have a common folding pattern in their 
catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed for the class 
I synthetases [7J.CIass-ll tRNA synthetases do not share a high degree of similarity, however at least three conserved 
regions are present [2,5,8]. Signature patterns have been derived from two of these regions. 
Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE 
Consensus pattern: [GSTALVF1-{DENQHRKPHGSTAHUVM 

[ 1] Schimmel P. Annu. Rev. Biochem. 56:1 25-1 58(1 987).[ 2] Delarue M.. Moras D. BioEssays 15:675-687(1 993). [ 3] 
Schimmel P Trends Biochem. Sci. 16:1 -3(1991 ).[ 4] Nagel G.M., Ooolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88: 
8121-8125(1991). [ 5] Cusack S., Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1 991 ).[ 6] Cusack S. 
Biochimie 75: 1077-1081 (1993).[ 7] Cusack S., Berthet-Colominas C. Haertlein M., Nassar N., Leberman R. Nature 
347:249-255(1990).[8] Leveque F., Plateau P., Dessen P., Blanquet S. Nucleic Acids Res. 18:305-312(1990). 
[1 554] 664. Aminoacyl-transfer RNA synthetases class-l signature (tRNA synt 1 e) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse in terms of subunit size and of quaternary structure. 
A few years ago it was found [2] that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particular the consensus tetrapeptide His-lle-GIy-His f HIGH') is very well conserved. The 'HIGH* region has 
been shown [3] to be part of the adenylate binding site. The 'HIGH' signature has been found in the aminoacyl-tRNA 
synthetases specific forarginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryp- 
tophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-l synthetases [4,5,6] and seem to 
share the same tertiary structure based on a Rossmann fold. 

Consensus pattern: P-x(0,2)^GSTANHDENCK3APK]-x4UVMFP]-[HT]-[U\^YAC}-G-[HNTG]-{LIVMFYSTAGPC 
[ 1] Schimmel P. Annu. Rev. Biochem. 56: 125-1 58(1 987).[ 2] Webster T. t Tsai H., Kula M., Mackie G.A.. Schimmel P 
Science 226:1315-131 7(1984).[ 3] Brick P., Bhat T.N., Blow D.M. J. Mol. Biol. 208:83-98(1 988).[ 4] Delarue M., Moras 
D. BioEssays 1 5:675-687(1 993).[ 5] Schimmel P Trends Biochem. Sci. 16:1-3(1991),[6] Nagel G.M., Ooolittle R F. 
Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
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Tropomyosins [12] are lamiiy of closely related proteins present in muscle and non-muscle cells. In striated muscle 
Uopomyosin med.ate the interactions between the troponin complex and actin so as to regulate muscle contraction' 

TnH Z£l 1 M mef f e iS ° ,0m,S °' tr °P° m V° sin are characterized by having 284 amino acid residues 

m , ^ £ ■ WhefeaS non - musc,e forms are 9 e "e««y smalter and are heterogeneous 

in the.r N-terrmnal regon. The stature pattern for tropomyosins is based on a very conserved region in the terminal 
section of tropomyos.ns and which is present in both muscle and non-muscle forms 
Consensus pattern: L-K-E-A-E-x-R-A-E 

[ < 1 i«T i "«, B T TrendS Bi0Chem - ^ 4:151 - 155 ( 1 979) f 2] McLeod A.R. BioEssays 6:208-212(19861 
[1543] 657. Troponin v ' 

^ZlteTTtoT^ ^ **■ ^ ^ (Tn,) " - <™> «. Pfam 

ZSHISS^^ 1 ^ ,r ° POnin ' ^ ' ^ ,0 3C * ^ ,rOP °" in T ** «° ^Vosin. 
Medline: 87144593 

Structure of co-crystals of tropomyosin and troponin. 
White SP. Cohen C, Phillips GN Jr; 
Nature 1987;325:826-828. [2] 
Medline: 95155315 

A direct regulatory role for troponin T and a dual role for 
troponin C in the Ca2+ regulation of muscle contraction. 
Potter JO, Sheng Z, Pan BS, Zhao J; 

J Biol Chem 1995;270:2557-2562. 

[3]Medline: 95324796 
The troponin complex and regulation of muscle contraction 
Farah CS, Reinach FC; 

FASEB J 1995;9:755-767. 
[1544] 658. (Tryp mucin) Mucin-like glycoprotein 

u?3r J hiS lamily * *ypano«mal proteins resemble vertebrate mucins. The protein consists of three regions The 

! ^ C ° nSefVed bStWeen a " membere of ,he ,amil * antral region is not we lcTservS 

and conta.ns a large number of threonine residues which can be glycosylated [11 conserved 

lnd.rect evidence suggested that these genes might encode the core protein of parasite mucins glycoproteins that 
were proposed to be involved in the interaction with, and invasion of. mammalian host cell" 9lyCOpr ° ,e,nS 

[1] Oi Noia JM, Sanchez DO. Frasch AC; J Biol Chem 1995270-24146-24149 

[2] Di Noia JM. D'Orso I. Aslund l_ Sanchez DO, Frasch AC; J Biol Chem 1998;273:10843-10850. 

[1 546] 659. AminoacyMransf er RNA synthetases class-l signature (tRNA synt 1 ) 

£2S?2^ ^f 568 <5 C r 6 11 -> ™ are a 8 rou P - «W which activate amino acids and transfer them 
SZEJt , f *° SteP Pro,ein Bta W"*^«- Photic organisms there are at least ^ 
drfferent types of ammoacyHRNA synthetases, one for each differentamino acid, ^kan/oles there arT 

these enzymes have a common function, they are widely diverse interms of subunit size and of quaternary slue ure 
Atew y earsa^ rt wasf W rKj[2]tr«tseveralaminoacyRflNAsyn m etase^ 

clTh^Z ? « r 6nyla,e bmdm9 Srte - 71,0 ' HIGH, si 9 na,ure has been to"* ^ the aminoactl-tRNA 

synthetases specrnc for argmme. cysteine, gfutamic acid, glutamine. isoleucine. leucine, methionine. Xyr^etZ 
tophan and valme. These aminoacyl-tRNA synthetases are referred to as class-l synthetases [4 5 6] seem to 

[1] Schwmel R Annu. Rev. Biochem. 56:125-158(1987)J 2] Webster T. Tsai H.. Kufe. M., Mackie G A. Schimmel P 

S %% SSSSi?^- r B DM - J - M BioL 2o8:83 " 98 ( i988 ){ 4 i £££5 S 

!SSS^^ ^ ""^""Mei Nage. G.M.. Doodle R.F. 
[1547] 660. AminoacyMransfer RNA synthetases class-l signature (tRNA synt 1 b) 

AmmoacyMRNA synthetases (EC 6.1.1,) [1] are a group of enzymes which activate amino acids and transfer them 



EP 1 033 405 A2 



Schulz H., Elzinga M. J. Biol. Chem. 265:10424-10429(1990).[ 3] Igual J.C., Gonzalez-Bosch C , Dopazo J Perez- 
Ortin J.E. J. Mol. Evol. 35: 147-1 55(1 992).[ 4] Baker M.E.. Billheimer J.T., Strauss J.F. Ill DNA Cell Biol 10*695-698 
(1991). 

[1 558] 668. Thioredoxin family active site 

Thioredoxins [1 to 4] are small proteins of approximately one hundred amino-acid residues which participate in various 
redox reactions via the reversible oxidation of an active center disulfide bond. They exist in either a reduced form or 
an oxidized form where the two cysteine residues are linked in an intramolecular disulfide bond. Thioredoxin is present 
in prokaryotes and eukaryotes and the sequence around the redox^active disulfide bond is wellconserved. Bacteri- 
ophage T4 also encodes for a thioredoxin but its primary structure is not homologous to bacterial, plant and vertebrate 
thioredoxins. A number of eukaryotic proteins contain domains evolutionary related tothioredoxin, all of them seem to 
be protein disulphide isomerases (PDI). PDI(EC 5.3.4.1 ) [5,6,7] is an endoplasmic reticulum enzyme that catalyzes 
the rearrangement of disulfide bonds in various proteins. The various forms of PDI which are currently known are: - 
PDI major isozyme; a multifunctional protein that also function as the beta subunit of prolyl 4-hydroxylase (EC 
1-14.11.2), as a component of oligosaccharyl transferase (EC 2.4.1.119 ). as thyroxine deiodinase (EC 3.8. 1.4), as 
glutathione-insulin transhydrogenase (EC 1.8.4.2 ) and as a thyroid hormone-binding protein ! - ERp60 (ER-60; 58 Kd 
microsomal protein). ERp60 was originally thought to be a phosphoinositide-specific phospholipase C isozyme and 
later to be a protease. - ERp72. - P5. All PDI contains two or three (ERp72) copies of the thioredoxin domain. Bacterial 
proteins that act as thiol-disulfide interchange proteins thatallows disulfide bond formation in some periplasmic proteins 
also contain a thioredoxin domain. These proteins are: - Escherichia coli dsbA (or prfA) and its orthologs in Vibrio 
cholerae (tcpG) and Haemophilus influenzae (por). - Escherichia coli dsbC (or xpRA) and its orthologs in Erwinia 
chrysanthemi and Haemophilus influenzae. - Escherichia coli dsbD (or dipZ) and its Haemophilus influenzae orthotog. 
- Escherichia coli dsbE (orccmG) and orthologs in Haemophilus influenzae, Rhodobacter capsulatus (helX) t Rhizio- 
biacae (cycY and tlpA). Consensus pattern: [LIVMF]-[LI VMSTA]-x-[UVMFYCJ-[FYWSTHE]-x(2)- [FYWGTNJ-C- [GAT- 
PLVE]-[PHYWSTA]-C-x(6)-[LI VMFYWT] [The two Cs form the redox-active bond] 

[ 1] Holmgren A. Annu. Rev. Biochem. 54:237-271 (1985).[ 2] Gleason F.K., Holmgren A. FEMS Microbiol. Rev 54' 
271-297(1988).[3] Holmgren A. J. Biol. Chem. 264: 13963-1 3966(1 989). [ 4] Eklund H. ( Gleason F.K., Holmgren A. 
Proteins 11:13-28(1 991 ).[ 5] Freedman R.B., Hawkins H.C., Murant S.J., Reid L. Biochem. Soc. Trans. 16:96-99(1 988). 
[ 6) Kivirikko K.I., Myllyla R., Pihlajaniemi T FASEB J. 3: 1609-1617(1 989).( 7] Freedman R.B., Hirst T.R., Tuite M R 
Trends Biochem. Sci. 19:331-336(1994). 

[1559] 669. (Transcript fac2) Transcription factor TFIIB repeat signature 

In eukaryotes the initiation of transcription of protein encoding genes by polymerase II is modulated by general and 
specific transcription factors. The general transcription factors operate through common promoters elements (such as 
the TATA box). At least seven different proteins associates to form the general transcription factors: TFIIA, -IIB, -IID. - 
HE, -IIF, -IIG, and -IIH[1]Transcription factor IIB (TFIIB) plays a central role in the transcription of class II genes, it 
associates with a complex of TFIID-IIA bound to DNA (DA complex) to form a ternary complex TFIID-IIA-IBB (DAB 
complex) which is then recognized by RNA polymerase II [2,3]. TFIIB is a protein of about 315 to 340amino acid 
residues which contains, in its C-terminal part an imperfect repeat of a domain of about 75 residues. This repeat could 
contribute an element of symmetry to the folded protein. The following proteins have been shown to be evolutionary 
related to TFIIB: - An archaebacterial TFIIB homolog. In Pyrococcus woesei a previously undetected open reading 
frame has been shown [4] to be highly related to TFIIB. - Fungal transcription factor 1MB 70 Kd subunit (gene 
PCF4/TDS4/BRF1 ) [5], This protein is a general activator of RNApotymerase III transcription and plays a role analogous 
to that of TFIIB in pol III transcription. The central section of the repeated domain, which is the most conserved part of 
that domain has been selected as a signature pattern. 
Consensus pattern: G-[KR]-x(3)- tSTAGN]-x-[UVMYA]-[GSTA^^ 

[ 1] Weinmann R. Gene Expr. 2:81-91(1992).[ 2] Hawley D. Trends Biochem. Sci. 16:317-318(1 991 ).[ 3] Ha I . Lane 
W.S., Reinberg D. Nature 352:689-695(1 991 ).[ 4] Ouzounis C, Sander C. Cell 7 1:1 89-1 90(1 992) f 5] Khoo B Brophy 
B., Jackson S.R Genes Dev. 8:2879-2890(1994). 
[1560] 670. (transcritp fact) MADS-box domain signature and profile 

A number of transcription factors contain a conserved domain of 56 amino-acid residues, sometimes known as the 
MADS-box domain [El]. They are listed below - Serum response factor (SRF) [1], a mammalian transcription factor 
that binds to the Serum Response Element (SRE). This is a short sequence of dyad symmetry located 300 bp to the 
5'end of the transcription initiation site of genes such as c-fos. - Mammalian myocyte-specific enhancer factors 2A to 
2D (MEF2A to MEF2D). These proteins are transcription factor which binds specifically to the MEF2 element present 
in the regulatory regions of many muscle-specific genes. - Drosophila myocyte-specific enhancer factor 2 (MEF2). - 
Yeast GRM/PRTF protein (gene MCM1) [2], a transcriptional regulator of mating-type-specific genes. - Yeast arginine 
metabolism regulation protein I (gene ARGR1 or ARG80). - Yeast transcription factor RLM1. - Yeast transcription factor 
SMP1 . - Arabidopsis thaliana agamous protein (AG) [3], a probable transcription factor involved in regulating genes 
that determines stamen and carpel development in wild-type flowers. Mutations in the AG gene result in the replacement 
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[1555] 665. Aminoacyl-transfer RNA synthetases ciass-ll signatures (tRNA synt 2b) 

Amhoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse interms of subunit size and of quaternary structure. 
The synthetases specific for alanine, asparagine, asparticacid, glycine, histidine, lysine, phenylalanine, proline, serine 
and threonine are referred to as class-ll synthetases [2 to 6] and probably have a common folding pattern in their 
catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed for the class 
I synthetases [7].CIass-ll tRNA synthetases do not share a high degree of similarity, however at least three conserved 
regions are present [2,5,8]. Signature patterns have been derived from two of these regions 
Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-tDE 

Consensus pattern: [G STALVF]-{DE NQHRKP}-[GSTA]-[LI VMF]-{DE]-R-[U VMF]-x-[LI VMSTAG]-[LI VMFY] 

[ 1] Schimmel R Anna Rev Bbchem. 56:1 25-1 58(1 987).[ 2] Delarue M., Moras D. BioEssays 15 675-687(1993) [ 3] 

Schimme! P. Trends Biochem. Sci. 16:1-3(1991).[4) Nagel G.M.. Doolittle R.R Proc. Natl. Acad. Sci. U.S.A. 88: 

8121-8125(1991). [ 5] Cusack S., Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991) [6] Cusack S 

Biochimie 75:1 077-1081 (1993).[ 7] Cusack S., BertheK^lominas C, Haertlein M., Nassar N., Leberman R. Nature 

347:249-255(1990).[8] Leveque F. t Plateau P., Dessen P., Blanquet S. Nucleic Acids Res. 18:305-312(1990) 

[1 556] 666. Thaumatin family signature 

Thaumatin [1] is an intensively sweet-tasting protein (100 000 times sweeter than sucrose on a molar basis) from 
Thaumatococcus daniellii, an African brush. The protein is made of about 200 residues and contains 8 disulfide bonds 
A number of proteins have been found to be related to thaumatins. These protein are listed below (references are only 
provided for recently determined sequences). - A maize alpha-amylase/trypsin inhibitor. - Two tobacco pathogenesis- 
related proteins: PR-R major and minor forms, which are induced after infection with viruses. - Salt-induced protein 
NP24 from tomato. - Osmotin, a salt-induced protein from tobacco. - Osmotin-like proteins OSML13, OSML15 and 
OSML81 from potato [2]. - P21 , a leaf protein from soybean. - PWIR2. a leaf protein from wheat. - Zeamatin, a maize 
antifunal protein [3].The exact biological function of all these proteins is not yet known. A conserved region that includes 
three cystetne residues known (in thaumatin) to be involved in disulfide bonds has been selected as a signature pattern 
h ,-!+—. — + || ******* hi 

xxCxxxxxxxxxxxxxxxxCxxCxxCxCxxxxxxxxxxxxxxC^ CxxxCxIlilllllllll +-+ | +--+ 

+-++-+ I ^ ♦-'C'; conserved cysteine involved in a disulfide bond. 1 * 1 : position of the pattern 

Consensus pattern: G-x-[GF]-x-C-x-T-[GA]-D-C-x(1 ,2)-G-x(2,3)-C 

1 1] Edens L, Heslinga L, Kiok R, Ledeboer A.M., Maat J., Toonen MY, Visser C. Verrips C T. Gene 18 1-12(1982) 
[ 2] Zhu B.. Chen T.H.H.. Li RH. Plant Physiol. 1 08:929-937(1 995).[ 3] Malehorn D.E., Borgmeyer J R Smith C E 
Shah D.M.; Plant Physiol. 106:1471-1481(1994). * " 

[1557] 667. Thiolases signatures 

Two different types of thiolase [1 ,2,3] are found both in eukaryotes and in prokaryotes: acetoacetyl-CoA thiolase (EC 
gSJM and 3-ketoacyl-CoA thiolase(EC 2.3.1.161 3-ketoacyl-CoA thiolase (also called thiolase I) has a broad chain- 
length specificity for its substrates and is involved in degradative pathways such as fatty acid betaoxidation Ace- 
toacetyl-CoA thiolase (also called thiolase II) is specific for the thiolysis of acetoacetyl-CoAand involved in biosynthetic 
pathways such as poly beta-hydroxybutyrate synthesisor steroid biogenesis. In eukaryotes, there are two forms of 
3-ketoacyl-CoA thiolase: one located in the mitochondrion and the other in peroxisomes. There are two conserved 
cysteine residues important for thiolase activity. The first located in the N-terminal section of the enzymes is involved 
in the formation of an acyl-enzyme intermediate; the second located at the C-terminal extremity is the active site base 
involved in deprotonation in the condensation reaction. Mammalian nonspecific lipid-transfer protein (nsL-TP) (also 
known as sterol carrier protein 2) is a protein which seems to exist in two different forms: a 14 Kd protein (SCP-2) and 
a larger 58 Kd protein (SCP-x). The former is found in the cytoplasm or the mitochondria and is involved in lipid transport 
the latter is found in peroxisomes. ' 
The C-terminal part of SCP-x is identical to SCP-2 while the N-termina! portion is evolutionary related to thiolases[4] 
Three signature patterns have been developed for this family of proteins, two of which are based on the regions around 
the biolog»cally important cysteines. The third is based on a highly conserved region in the C-terminal part of these 
proteins. 

Consensus pattern: [LI VM]-(NST]-x(2)-C-[SAGLIHST]-{SAG]-[U VMFYNS]-x-[STAGHLI VM]-x(6)-[UVM] [C is involved 
in formation of acyl-enzyme intermediate] 

Consensus pattern: N-x(2)-G-G-x-(LIVMHSA]-x-G-H-P-x-[GA]-x-[ST]-G 

Consensus pattern: [AG]-[LIVMA]-[STAGCU VM]-[STAG]-[LIVMA]^-x-[AG]-x-[AG]-x-[AG]-x-[SAG][C is the active site 
residue] 

[ 1] Peoples O.P., Sinskey A. J. J. Biol. Chem. 264:1 5293-1 5297(1 989).[ 2] Yang S.-Y, Yang X.-Y.H., Heafy-Louie G. ( 
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transmembrane anchor.TM2 to TM4: transmembrane regions 2 to4.'C: conserved cysteine. : position of the pattern 
A conserved region that includes two cysteines and seems to be located in a short cytoplasmic loop between two 
transmembrane domains has been selected as a signature for these proteins. 

Consensus pattern: G-x(3)-[LIVMF]-x(2HGSAHLIVMF](2)-G-C-x-[GA]-[STA]- x(2)-[EG]-x(2)-[CWN] {LIVM](2) 
[ 1] Levy S.. Nguyen V.Q., Andria M.L., Takahashi S. J. BioL Chem. 266:14597-1 4602(1 991 ).( 2] Tomlinson M G Wil- 
liams A.F., Wright M.D. Eur. J. Immunol. 23:136-40(1993).[ 3] Barclay A.N., Birkeland M.L, Brown M.H.. Beyers A.D.. 
Davis S.J.. Somoza C. t Williams A.F. The leucocyte antigen factbooks. Academic Press, London / San Diego, (1993). 
[1 563] 673. Tryptophan synthase alpha chain signature 

Tryptophan synthase catalyzes the last step in the biosynthesis of tryptophan: the conversion of indoleglycerol phos- 
phate and serine, totryptophan and glyceraldehyde 3-phosphate [1,2]. It has two functional domains: one for the aldol 
cleavage of indoleglycerol phosphate to indole andgryceraldehyde 3-phosphate and the other for the synthesis of 
tryptophan fromindole and serine. In bacteria and plants [3], each domain is found on a separate subunit (alpha and 
beta chains), while in fungi the two domains are fused together on a single multifunctional protein. A conserved region 
that contains three conserved acidic residues has been selected as a signature pattern for the alpha chain. The first 
and the third acidic residues are believed to serve as proton donors/acceptors in the enzyme's catalytic mechanism 
Consensus pattern: [LIVM]-E-{LIVMJ-G-x(2)-[FYC]-[STJ-[DEHPA]-[LI VMY]- [AGLI]-[DE]-G 

{ 1) Crawford I.P. Annu. Rev. Microbiol. 43:567-600(1 989).[ 2] Hyde C.C., Miles E.W. Bio/Technology 8:27-32(1990) 
[ 3] Berlyn M.B., Last R.L., Fink G.R. Proc. Natl. Acad. Sci. U.S.A. 86:4604-4608(1989). 
[1564] 674. Tryptophan synthase beta chain pyridoxal-phosphate attachment site 

Tryptophan synthase catalyzes the last step in the biosynthesis of tryptophan: the conversion of indoleglycerol phos- 
phate and serine, totryptophan and glyceraldehyde 3-phosphate [1.2J. It has two functional domains: one for the aldol 
cleavage of indoleglycerol phosphate to indole andgryceraldehyde 3-phosphate and the other for the synthesis of 
tryptophan fromindole and serine. In bacteria and plants [3], each domain is found on a separate subunit (alpha and 
beta chains), while in fungi the two domains arefused together on a single multifunctional protein. The beta chain of 
the enzyme requires pyridoxal-phosphate as a cofactor. The pyridoxal-phosphate group is attached to a lysine residue. 
The region around this lysine residue also contains two histidine residues which are part of the pyridoxal-phosphate 
binding site. The signature pattern for the tryptophansynthase beta chain is derived from that conserved region. 

- Consensus pattern: [LIVM]-x-H-x-G-{STA]-H-K-x-N [K is the pyridoxal-P attachment site] 

[ 1] Crawford I.P Annu. Rev. Microbiol. 43:567-600(1 989).[ 2] Hyde C.C., Miles E.W. Biotechnology 8:27-32(1990) 
[ 3] Berlyn M.B., Last R.L, Fink G.R. Proc. Natl. Acad. Sci. U.S.A. 86:4604-4608(1989). 
[1565] 675. Serine proteases, trypsin family, active sites 

The catalytic activity of the serine proteases from the trypsin family is provided by a charge relay system involving an 
aspartic acid residue hydrogen-bonded to a histidine, which itself is hydrogen-bonded to a serine. The sequences in 
the vicinity of the active site serine and histidine residues are well conserved in this family of proteases [1]. A partial 
list of proteases known to belong to the trypsin family is shown below. - Acrosin. - Blood coagulation factors VII, IX, X, 
XI and XII, thrombin, plasminogen, and protein C. - Cathepsin G. - Chymotrypsins. - Complement components Gir- 
ds, C2, and complement factors B, D and I. - Complement-activating component of RA-reactrve factor. - Cytotoxic 
cell proteases (granzymes A to H). - Duodenase I. - Elastases 1, 2. 3A, 3B (protease E), leukocyte (medullasin). - 
Enterokinase (EC 3.4.21.9) (enteropeptidase). - Hepatocyte growth factor activator. - Hepsin. - Glandular (tissue) ka- 
llikreins (including EGF-binding protein types A, B, and C, NGF^gamma chain, gamma-renin. prostate specific antigen 
(PSA) and tonin). - Plasma kallikreia - Mast cell proteases (MCP) 1 (chymase) to 8. - Myeloblasts (proteinase 3) 
(Wegener's autoantigen). - Plasminogen activators (urokinase-type, and tissue-type). - Trypsins l t II, III, and IV. - Tryp- 
tases. - Snake venom proteases such as ancrod, batroxobin, cerastobin, flavoxobin, and protein C activator. - Colla- 
genase from common cattle grub and collagenorytic protease from Atlantic sand fiddler crab. - Apolipoprotein(a). - 
Blood fluke cercarial protease. - Drosophila trypsin like proteases: alpha, easter, snake-locus. - Drosophila protease 
stubble (gene sb). - Major mite fecal allergen Der p III. All the above proteins belong to family S1 in the classification 
of peptidases[2JE1] and originate from eukaryotic species. It should be noted thatbacterial proteases that belong to 
family S2A are similar enough in the regions of the active site residues that they can be picked up by the same patterns. 
These proteases are listed below. - Achromobacter tyticus protease I. - Lysobacter alpha-tytic protease. - Streptogrisin 
A and B (Streptomyces proteases A and B). - Streptomyces griseus glutamyl endopeptidase II. - Streptomyces fradiae 
proteases 1 and 2. 

Consensus pattern: [LIVMHST]-A-[STAG]-H-C (H is the active site residue] 

Consensus pattern: [DNSTAGC]-[GSTAPIMVQH]-x(2)^-[DE]-S^-[GS]-[SAPH^-[LIVMPr^fHJ-[LIVMFYSTANQH] 
[S is the active site residue] 

[ 1] Brenner S. Nature 334:528-530(1 988). [2] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244: 19-6 1(1 994). [E1] 
[1566] 676. (tsp) Thrombospondin type 1 domain 
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of the stamens by petals and the carpels by a new flower. - Arabktopsis thaliana homeotic proteins Apetalat (AP1), 
Apetala3 (AP3) and Pistillata (PI) which act locally to specify the identity of the floral meristem and to determine sepal 
and petal development [4]. - Antirrhinum majus and tobacco homeotic protein deficiens (DEFA) and globosa (GLO) 
[5]. Both proteins are transcription factors involved in the genetic control of flower development Mutations in DEFA or 
GLO cause the transformation of petals into sepals and of stamina into carpels. - Arabidopsis thaliana putative tran- 
scription factors AGL1 to AGL6 [6J. - Antirrhinum majus morphogenetic protein DEF H33 (squarnosa).ln SRF. the 
conserved domain has been shown [1) to be involved in DNA-binding and dimerizatioa A pattern that spans the com- 
plete length of the domain has been derived. The profile also spans the length of the MADS-box. 
Consensus pattern: R-x-[RKl-x(5)-l-x^DNGSK]-x(3)-[KR]-x(2)-T-[FY]-x-^RK](3)- x(2)-[UVM]-x-K(2)-A-x-E-[LIVM]- 
[STA]-x-L-x<4)-[LlVM]-x- tLIVM](3)-x(6)-[LIVMF]-x(2HFY] 

[ 1 ] Norman C, Runswick M., Pollock R., Treisman R. Cell 55:989-1 003(1 988). f 2] Passmore S., Maine G.T., Elble R, 
Christ C., Tye B.-K. J. Mol. Bioi. 204:593-606(1 988).[ 3] Yanofsky M., Ma H., Bowman J., Drews G., Feldmann K.A.] 
Meyerowitz E.M. Nature 346:35-39(1 990).[ 4] Goto K., Meyerowitz E.M. Genes Dev. 8: 1548-1 560(1 994). [ 5] Troebner 
W., Ramirez L, Motte P., Hue I., Huijser P., Loennig W.-E., Saedler H. ( Sommer H., Schwartz-Sommer Z. EMBO J. 
11:4693-4704(1992).[ 6] Ma H., Yanofsky M.R, Meyerowitz E.M. Genes Dev. 5:484-495(1 991 ).[E1 J 
[1561] 671. Transketolase signatures 

Transketolase (EC 2.2.1.1 ) (TK) catalyzes the reversible transfer of a two-carbon ketol unit from xylulose 5-phosphate 
to an aldose receptor, such as ribose 5-phosphate, to form sedoheptulose 7-phosphate and glyceraldehyde 3-phos- 
phate. This enzyme, together with transaldolase, provides a link between the glycolytic and pentose-phosphate path- 
ways. TK requires thiamin pyrophosphate as a cofactor. In most sources where TK has been purified, it is a homodimer 
of approximately 70 Kd subunits. TK sequences from a variety of eukaryotic and prokaryotic sources [1 .2] show that 
the enzyme has been evolutionarily conserved. In the peroxisomes of methytotrophic yeast Hansenula polymorpha, 
there is a highly related enzyme, dihydroxy-acetone synthase (DHAS) (EC 2.2.1.3 ) (also known as formaldehyde tran- 
sketolase), which exhibits a very unusual specificity by including formaldehyde amongst its substrates. 1-deoxyxylu- 
lose-5-phosphate synthase (DXP synthase) [3] is an enzyme so far found in bacteria (gene dxs) and plants (gene 
CLA1) which catalyzes the thiamin pyrophosphoate-dependent acyloin condensation reaction between carbon atoms 
2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D- xylulose-5-phosphate (dxp), a precursor in 
the btosynthetic pathway to isoprenokJs, thiamin (vitamin B1 ), and pyridoxol (vitamin B6). DXP synthase is evolutionary 
related to TK. Two regions of TK have been selected as signature patterns. The first, located in the N-terminal section, 
contains a histidine residue which appears to function inproton transfer during catalysis [4]. The second, located in the 
central section, contains conserved acidic residues that are part of the active cleft and may participate in substrate- 
binding [4]. 

Consensus pattern: R-x(3)-[UVMTA]-[DENQSTHKF]-x(5 ( 6HGSN]-G-H-pUVMF]^GSTA]-x(2)-[UMC]^GS 
Consensus pattern: G-[DEQGSAHDN]-G-[PAEQ]-[STHHQ]^ 

( 1] Abedinia M., Layfield R., Jones S.M., Nixon P.F., Mattick J.S. Biochem. Biophys. Res. Commun. 183:1159-1166 
(1992).[2] Fletcher T.S., Kwee I.L. Nakada T., Largman C., Martin B.M. Biochemistry 31: 1892-1 896(1 992). [3] 
Sprenger G.A., Schorken U., Wiegert T, Grolle S., De Graaf A.A., Taylor S.V., Begley TP., Bringer-Meyer S., Sahm 
H. Proc. Natl. Acad. Sci. U.S.A. 94:1 2857-1 2862(1 997U 41 Lindovist Y. SchneiderG ErmlerU.,SundstroemM EMBO 
J. 11:2373-2379(1992). 

[1562] 672. Transmembrane 4 family signature 

Recently a number of eukaryotic cell surface antigens have been found to be evolutionary related [1 ,2,3]. The proteins 
known to belong to this family are listed below: - Mammalian antigen CD9 (MIC3); A protein involved in platelet activation 
and aggregation. - Mammalian leukocyte antigen CD37, expressed on B lymphocytes. - Mammalian leukocyte antigen 
CD53 (OX-44), which may be involved in growth regulation in hematopoietic cells. - Mammalian lysosomal membrane 
protein CD63 (melanoma-associated antigen ME491; antigen AD1). - Mammalian antigen CD81 (cell surface protein 
TAPA-1), which may play an important role in the regulation of lymphoma cell growth. - Mammalian antigen CD82 
(protein R2; antigen C33; Kangai 1 (KAI1)), which associates with CD4 or CD8 and delivers costimulatory signals for 
the TCR/CD3 pathway. - Mammalian antigen CD151 (SFA-1; platelet-endothelial tetraspan antigen 3 (PETA-3)). - 
Mammalian cell surface glycoprotein A15 (TALLA-1; MXS1). - Mammalian novel antigen 2 (NAG -2). - Human tumor- 
associated antigen CO-029. - Schistosoma mansoni and japonicum 23 Kd surface antigen (SM23/ SJ23).These pro- 
teins share the following characteristics: they all seem to be type III membrane proteins (type 111 proteins are integral 
membrane proteins that contain a N-terminai membrane-anchoring domain which is not cleaved during biosynthesis 
and which functions both as a translocation signal and as a membrane anchor); they also contain three additional 
transmembrane regions, at least seven conserved cysteines residues, and are of approximately the same size (218 
to 284 residues). These proteins are collectively know as the transmembrane 4 super family* (TM4) because they span 
the plasma membrane four times. A schematic diagram of the domain structure of these proteins isshown below +- 

H H + — H * + 1 *- — + II TMa I Extra I TM2I Cyt I TM3 I Extracellular I TM4 I 

Cytl +-h 1- +— C— -c — h -CC C— C- i — -C— + ********* cyt cytoplasmic domain. TMa : 
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the active site residue 

[ 1] Jentsch S., Seufert W., Hauser K-P. Biochim. Biophys. Acta 1089:127-1 39(1991).[ 2] D'andrea A., Pellman D. Crit. 
Rev. Biochem. Mol. Biol. 33:337-352(1 998).[ 3] Johnston S.C., Larsen C.N.. Cook WJ., Wilkinson K.D., Hill CP. EMBO 
J. 16:3787-3796(1 997).[ 4J Rawlings N.D., Barrett AJ. Meth. Enzymol. 244:461-486(1994). 
[1577] 682. Ubiquitin carboxyl-terminal hydrolases family 2 signatures (UCH-1) 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1.2] are thiol proteases that recognize and 
hydrolyze the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the processing of 
poly-ubiquitin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH. The second 
class consist of largeproteins (800 to 2000 residues) and is currently represented by: - Yeast UBP1, UBP2, UBP3, 
UBP4 (or DOA4/SSV7), UBP5, UBP7, UBP9, UBP10, UBP11, UBP12. UBP13, UBP14, UBP15 and UBP16. - Human 
tre-2. - Human isopeptidase T - Human isopeptidase T-3. - Mammalian Ode-1. - Mammalian Unp. - Mouse Dub-1. - 
Drosophila fat facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - Caenorhabditis elegans 
hypothetical protein R1 0E1 1 .3. - Caenorhabditis elegans hypothetical protein K02C4.3.These proteins only share two 
regions of similarity The first region containsa conserved cysteine which is probably implicated in the catalytic mech- 
anism. The second region contains two conserved histidines residues, one of which is also probably implicated in the 
catalytic mechanism. Signature patterns for both conserved regions have been developed. 

Consensus pattern: G-[UVMFY]-x(1,3HAGC]-[NASM]-x-^ istne puta . 

trve active site residue] 

Consensus pattern: Y-x-L-x-[SAG]-[LIVMFT]-x(2)-H-x-G.x(4,5)-G-H-Y [The two H's are putative active site residues] 
[ 1) Jentsch S., Seufert W. t HauserH.-P Biochim. Biophys. Acta 1089:127-1 39(1 991 ).[ 2] D'andrea A., Pellman 0. Crit. 
Rev. Biochem. Mol. Biol. 33:337-352(1 998). [ 3] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 
[1578] 683. Ubiquitin carboxyl-terminal hydrolases family 2 signatures (UCH-2) 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1,2] are thiol proteases that recognize and 
hydrolyze the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the processing of 
poly-ubiquitin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH. The second 
class consist of largeproteins (800 to 2000 residues) and is currently represented by: - Yeast UBP1, UBP2, UBP3 
UBP4 (or DOA4/SSV7), UBP5, UBP7, UBP9, UBP10, UBP11, UBP12. UBP13, UBP14 f UBP15 and UBP16. - Human 
tre-2. - Human isopeptidase T. - Human isopeptidase T-3. - Mammalian Ode-1. - Mammalian Unp. - Mouse Dub-1. - 
Drosophila fat facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - Caenorhabditis elegans 
hypothetical protein R10E11.3. - Caenorhabditis elegans hypothetical protein K02C4.3. These proteins only share two 
regions of similarity. The first region containsa conserved cysteine which is probably implicated in the catalytic mech- 
anism. The second region contains two conserved histidines residues, one of which is also probably implicated in the 
catalytic mechanism. Signature patterns for both conserved regions have been developed. 

Consensus pattern: G-[LIVMFY]-x(1 .3)-[AGC]-[NASM]-x-C-[FYW]-[LIVMC]-[NST]-[SACV]-x-[LI VMS]-Q [C is the puta- 
tive active site residue] 

Consensus pattern: Y-x-L-x-[SAG]-[LIVMFT]-x(2)-H-x-G-x(4,5)-G-H-Y [The two H's are putative active site residues] 
[ 1] Jentsch S. ( Seufert W., Hauser H.-P Biochim. Biophys. Acta 1089:127-1 39(1991).[ 2] D'andrea A., Pellman D. Crit. 
Rev. Biochem. Mol. Biol. 33:337-352(1 998). [3] Rawlings N.D., Barrett AJ. Meth. Enzymol. 244:461^486(1994). 
[1579] 684. UDP-glycosyttransferases signature 

UDP glycosyltransferases (UGT) are a superfamify of enzymes that catalyzes the addition of the glycosyl group from 
a UTP-sugar to a small hydrophobic molecule. This family currently consist of: - Mammalian UDP-glucoronosyl trans- 
ferases (UDPGT) [1,2]. A large family of membrane-bound microsomal enzymes which catalyze the transfer of glu- 
curonic acid to a wide variety of exogenous and endogenous lipophilic substrates. These enzymes are of major im- 
portance in the detoxification and subsequent elimination of xenobiotics such as drugs and carcinogens. - A large 
number of putative UDPGT from Caenorhabditis elegans. - Mammalian 2-hydroxyacylsphingosine 1 -beta-galactosyl- 
transferase [3] (also known as UDP-galactose-ceramide galactosyltransferase). This enzyme catalyzes the transfer 
of galactose to ceramide, a key enzymatic step in the biosynthesis of galactocerebrosides, which are abundant sphin- 
golipids of the myelin membrane of the central nervous system and peripheral nervous system. - Plants flavonol 0(3)- 
glucosyltransferase. An enzyme [4] that catalyzes the transfer of glucose from UDP-glucose to a flavanol. This reaction 
is essential and one of the last steps in anthocyanin pigment biosynthesis. - Baculoviruses ecdysteroid UDP-glucosyl- 
transferase (EC 2.4.1.-) [5] (egt). This enzyme catalyzes the transfer of glucose from UDP-glucose to ectysteroids 
which are insect molting hormones. The expression of egt in the insect host interferes with the normal insect develop- 
ment by blocking the molting process. - Prokaryotic zeaxanthin glucosyl transferase (gene crtX), an enzyme involved 
in carotenoid biosynthesis and that catalyses the glycosylation reaction which converts zeaxanthin to zeaxanthin -beta- 
diglucoside.-Streptomyces macrolide glycosyltransferases [6]. These enzymes specifically inactivates macrolide ani- 
tibkrtics via ^-O-glycosylation using UDP-glucose.These enzymes share a conserved domain of about 50 amino acid 
residues locatedin their C-terminal section and from which a pattern has been extracted todetect them. 
Consensus pattern: [FW]-x(2)-Q-x(2)-[LIVMYA]-[LIMV]-x(4,6)-[LVGAC]- [LVFYA]-[LIVMF]-[STAGCM]-[HNQ]- 
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[1567] [1] Bork P; FEBS iett 1993;327:125-130. 

[1568] 677. Tubulin subunits alpha, beta, and gamma signature 

Tubuhns [1 ,2], the major constituent of microtubules are dimeric proteins which consist ol two closed m i a t„H * u „ 

GTP ,s hydrolyzeddunng incorporation into the microtubule. Nearthe E site is an invent region rich in aS^JSZ 
is found in both chains andwhich is now [3 J said to contro. the access of the JZZT^ZZ sT^Te 

of cicely related alpha and beta .sotypes. In mwt species there is a ^ 

Gamma tubulin ,s found at microtubule organizing centers (MTOC) such as the ^^^cS^T^^ 
gestmg that it is involved in the minus-end nucleation of microtubule assembly Ml centrosome. sug- 

Consensus pattern: [SAG]-G-G-T-G-[SA]-G 

[1] Cleveland D. W.. Sullivan K.F. Anna Rev. Bkxhem. 54:331 -365(1985).[ 2] Joshi H C Cleveland D W Ce.l Mo.il 
[1569] Tubulin-beta mRNA autoregulation signal 

The stability of beta-tubulin mRNAs are autoregulated by their own translation product [1] Unpolymerized tubulin 
umts b.nd d.rectty (or activate a factor(s) which b«ds cc-trans^ 
binding .s transduced through the adjacent rib^es to activate 

The recogn,t,on element has been shown to be the first four amho acids of beta'ubu.in: JS^SS^Z^ 
to th,s sequence abolish the autoregulation effect (except for the replacement of G.u by ^yl^£^i 
sequence to an internal region of a polypeptide a.so suppresses the autoregulatory effect ,fanSpOSrtlon 0( ,h,s 
Consensus pattern: <M-R-{DE]-(IL] ' 
[ 1] Cleveland D.W. Trends Biochem. Sci 13 339-343(1988) 

!eC6 ] , H | hi are VZwllZ^T M ^"oacyMRNA synthetases 

are a 9 rou P of enzymes which activate amino acids and transfer them to specific tRNA molecules as 

.RNA 1 1*" P m , b,0Synthesis - ln P rokai Y°«' ^nisms there are at least twenty differen Zs oT^Sn^cvl! 
tRNA synthetases, one for each different amino acid. In eukaryotes there are generally two amhoacS^ZES 

function, they are widely diverse in terms of subunit size and of quaternary structure The synthetases snecTTr 
alanine. asparag.ne. asparUc acid, gfycine. histidine. lysine, phenylalanine. 

to asclass-ll synthetases [2 to 6] and probably have a common ^^^ceJ^S^^^S^l 
? T, andam,noac,d fe Cerent to the Bas^iMtt^^^T^S^^S^ 
synthetases do not share a high degree of similarity, however at least three conservedTegions ar^ 
Signature patterns have been derived from two of these regions 9 P l2,5,81 - 

Consensus pattern: [FYH]-R-x-{DE]-x(4.12)-[RH]-x(3)-F-x(3HOEl- 

f iilh^ZTp T B,OChem - a^-'SBOWTHa] Oelarue M., Moras D. BioEssays 15 675-687 

aS J £££ B ^ ^ 16:1 - 3 < 1991 M41 Nage. G.M.. Doolittle R.F. Proc. Natl. /Ll Sci Js 

^ pr I lc Cusack S.. Haertlem M.. Leberman R. Nucleic Acids Res. 1 9:3489-3498(1991 » f 61 Cusack 

^ 4 L f \ ,Str ^ tUrertahUTOnDNArepair P rrteinUBA ^ i "^ Vor DieckmannT With«rc 

r™ ES ^ afOS,nSki m ' UU CR 0,60 ,S " Fei 9°" J : Nat St ™» Biol 1998;5:1042-1047 ' 
[1575] 680. UBX domain 

^".SST' P*** Pi»«lt h FAF1 M Shpt.Numto, o( mcnter 19 

[1576] 68t (UCH) Ubiquitin carboxyl-terminal hydrolases family 1 cysteine active site 

Ubjquitm carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1.2] are thiol proteases that recconir* flnri 
hydrolyze the peptfle bond at the C-termina, gfycine of ubiquitin .^hesi eiuymes Te 
poly-ubiquitm precursors as well as that of ubiquinated proteins. There are two distinct families of UGH 
consist of enzymes ofabout 25 Kd and is currently represented by: - Mammalian isSel u a^U ' 
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last potential transmembrane region has been selected as a signature pattern,. 

Consensus pattern: G-[STIF]-V-x(2HLIVM]-x(6HLIVMF]-x(3)-[DQ]-x(3)-[LIV]- x-fUV]-P-N-x(2)-[LIVMFHLIVFSTA]-x 
(5)-N 

1 1] Bairoch A. Unpublished observations (1997). 

s [1586] 689. Uncharacterized protein family UPF0004 signature 

The following uncharacterized proteins have been shown (1] to share regions of similarities: • Escherichia coli hypo- 
thetical protein yliG. - Escherichia coli hypothetical protein yleAand HI001 9, the corresponding Haemophilus influenzae 
protein. - Bacillus subtilis hypothetical protein yqeV. - Helicobacter pylori hypothetical protein HP0269. - Helicobacter 
pylori hypothetical protein HP0285. - Mycoplasma iowae hypothetical protein in 16S RNA 5'region. - Mycobacterium 

10 leprae hypothetical protein B2235_C2_195. - Pseudomonas aeruginosa hypothetical protein in hemL 3'region. - Syn- 
echocystis strain PCC 6803 hypothetical protein slr0082. - Synechocystis strain PCC 6803 hypothetical protein 
si 10996. - Methanococcus jannaschii hypothetical protein MJ0865. • Methanococcus jannaschii hypothetical protein 
MJ0867. - Caenorhabditis elegans hypothetical protein F25B5.5.The size of these proteins range from 47 to 61 Kd. 
They contain six conserved cysteines, three of which are clustered in a region that can be used as asignature pattern. 

is Consensus pattern: (LIVM]-x-[LIVMT]-x(2)-G-C-x(3)-C-[STAN]-[FY]-C-x-ILIVM]-x(4)-G 
[1] Bairoch A. Unpublished observations (1997). 
[1587] 690. Uncharacterized protein family UPF0005 signature 

The following proteins seems to be evolutionary related [1]: - Mammalian protein TEGT (Testis Enhanced Gene Tran- 
script). - Escherichia coli hypothetical protein yccA and HI0044, the corresponding Haemophilus influenzae protein. - 
20 a probable Pseudomonas aeruginosa ortholog of yccA. These are proteins of about 25 Kd which seem to contain 
seven transmembranedomains. A signature pattern that corresponds to a region that starts with the beginning of the 
third transmembrane domain and ends in the middle of the fourth one has been developed. 

Consensus pattern: G^LlVM](2)-[SA)-x(5,8)-G-x(2)-[LIVM]-G-P-x-L-x(4)-[SAG]-x(4 1 6)-[LIVM](2)-x(2)-A-x(3)-T-A- 
[LIVM](2)-F 

25 [1] Walter L, Marynen P., Szpirer J., LevanG., GuentherE. Genomics 28:301-304(1995). 
[1 588] 691 . Uncharacterized protein family UPF0006 signatures 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome II 
hypothetical protein YBL055c. - Escherichia coli hypothetical protein ycfH and HI0454, the corresponding Haemophilus 
influenzae protein. - Escherichia coli hypothetical protein yigW. - Escherichia coli hypothetical protein yjjV and HI0081 , 
30 the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yabD. - Haemophilus influ- 
enzae hypothetical protein H1 1664. - Mycoplasma genitalium hypothetical protein MG009. These are proteins of from 
24 to 47 Kd which contain a number of conserved regions. They can be picked up in the database by the following 
patterns. 

Consensus pattern: [LIVMFY](2)-D-(STA]-H-x-H-[UVMF]-[DN 
3S Consensus pattern: P-[UVM]-x-[LIVM]-H-x-R-x-[TA]-x-[DE 

Consensus pattern: [LVSA]-[LIVA]-x(2)-[UVM]-[PS]-x(3)-L-[LIVM]-[LIVMS]-E-T. D-x-P 
[ 1] Bairoch A., Rudd K.E. Unpublished observations (1995). 
[1589] 692. Uncharacterized protein family UPF0007 signature 

The following proteins seems to be evolutionary related [1]: - Escherichia coli hypothetical protein ygbP and HI0672, 
40 the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yacM. - Mycobacterium tu- 
berculosis hypothetical protein MtCY06G1 1.29c. - Synechocystis strain PCC 6803 hypothetical protein slr0951. - A 
Rhodobacter capsulatus hypothetical protein in nif R3 5'region. Except for the Rhodobacter protein which contains a 
C-terminal extension, all these proteins have from 225 to 236 amino acids. They are hydrophilic proteins that can be 
picked up in the database by the following pattern. 
4S Consensus pattern: V-L-[lv>H-D-[GA]-A-R 

[ 1] Bairoch A. Unpublished observations (1997). 

[1590] 693. Uncharacterized protein family UPF0015 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome II 
hypothetical protein YBR002c. - Yeast chromosome XIII hypothetical protein YMR101c. - Escherichia coli hypothetical 

so protein yaeU and HI0920, the corresponding Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein 
HP1?21. - Mycobacterium leprae hypothetical protein B1937_F2_65. - A Corynebacterium glutamicum hypothetical 
protein in aroF 3'region. - A Streptomyces fradiae hypothetical protein in transposon Tn4556. - Synechocystis strain 
PCC 6803 hypothetical protein sll0505. - Methanococcus jannaschii hypothetical protein MJ1372.These are proteins 
of about 26 to 40 Kd whose central region is well conserved. They can be picked up in the database by the following 

ss pattern. 

Consensus pattern: [DE]-[UVMF](3)-R-T-|SG]-G-x(2)-R-x-S-x-[FYHUVM](2)-W-Q- 

[ 1] Wolfe K.H., Lohan A.J.E. Yeast 10:S41-S46(1994). 

[1591] 694. Uncharacterized protein family UPF00 16 signature 
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ISTAGCJ^-x(2)-[STAG]-x(3HSTAGL]-[LIVMFA]-x(4)-[PQR]-[LIVMT]-x(3)-[PA]-x(3WDESWQEHNl 

r S ™ e ^ 5:1110 " ni2(1989) [6lHernandezC ' OlanoC ' Mendezc. Salas J A Gene 134139 140,199* 
[1S80] 685. UDP-glucose/GDP-mannose dehydrogenase family 134.139-140(1993). 

S 5 ? 1 I1 I Punfication 30(4 characterization of guanosine diphospho-D-mannose dehydrogenase A kev enzvm* ■„ 
the b.osynthes.s of alginate by Pseudomonas aeruginosa. Roychoudhuiy S May TB SlZh 4 / 
Chakrabarty AM; J BiolChem 1989264 93SO-QW m p^XoITl . I ' gh SK ' Fe,n 9°W DS, 

[1 S83] 686. Uracil-DNA glycosylase signature 

««« r 7, V? ,, i'l J ~ D " AlVa ' A - S ' slu PP hau 9 G.. Kavfi B., Alseth I., Krohan H E Tainer J A Cell fifrttum 

donnaS. J. Biol. Chem. 268:1310-1319(1993) riOlRam^n F i ^nt cTTn . M ^ Multer SJ " Cara " 
(1 993) yil y9d M 1 °J ^mes D-E.. Ltndahl T, Sedgwick B. Cum Opin. Cell Bioi. 5:424-433 

[1 584] 687. Uncharacterized protein family UPF0001 signature 

The following uncharacterized proteins have been shown [1] to share regions ofsimilarities: 

gen wnicn is located .n the first third of these protems has been selected as a signature pattern. 

Consensus pattern: FW]-H-[FMHIVI-G-x-{UV]-Q-x-INKR]-K-x(3)-ruV] 
[ 1J Bairoch A., Rudd K.E. Unpublished observations (1996). 
[1 585] 688. Uncharacterized protein family UPF0003 signature 
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contain at least six or seven transmembrane regions. A conserved region located in the central section of these proteins 
has been developed as a signature pattern,. 

Consensus pattern: Y-x(2)-F-[LI VMA](2)-x-L-x(4)-G-x(2)-F-[EQ]-[LI VMFJ-P- [LIVM] - [ 1 ] Bairoch A., Rudd K.E. Unpub- 
lished observations (1996). 

[1602] 702. Uncharacterized protein family UPF0034 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yhdG and HI0979, the corresponding Haemophilus influenzae protein. - Escherichia coli hypothetical 
protein yjbN and HI0634, the corresponding Haemophilus influenzae protein. - Escherichia coli hypothetical protein 
yohl and HI0270, the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yacF. - 
Rhodobacter capsulatus protein nifR3 and related proteins in Azospirillum brasilense and Rhizobium leguminosarum. 
- Synechocystis strain PCC 6803 hypothetical protein slr0644. - Synechocystis strain PCC 6803 hypothetical protein 
SII0926. - Caenorhabditis elegans hypothetical protein C45G9.2. - Yeast protein SMM1. - Yeast hypothetical protein 
YLR401C. - Yeast hypothetical protein YLR405w. - Yeast hypothetical protein YML080w. Although it has been proposed 
[2] that Rhodobacter capsulatus nifR3 is a transcriptional regulatory protein, it is believed that these proteins constitute 
a family of enzymes whose active site could include a conserved cysteine which has been used as the central part of 
a signature pattern. 

Consensus pattern: [LIVM)-(DNG]-{LIVM]'N-x-G-C-P-x(3)-[LIVMASQ]-x(5)-G-[SAC] 

[ 1] Bairoch A., Rudd K.E. Unpublished observations (1995).[2] Foster-Hartnett D., Cullen P.J., Gabbert K.K., Kranz 
R.G. Mol. Microbiol. 8:903-914(1993). 

[1603] 703. Uncharacterized protein family UPF0038 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yacE and HI0890, the corresponding Haemophilus influenzae protein. - Mycobacterium tuberculosis 
hypothetical protein MtCY01B2.23andO410, the corresponding Mycobacterium leprae protein. - Synechocystis strain 
PCC 6803 hypothetical protein slr0553. - Other hypothetical proteins from Aeromonas hydrophila, Bacteroides nodo- 
sus, Neisseria gonorrhoeae, Pseudomonas putida, Thermus thermophilus and Xanthomonas campestris. - Human 
hypothetical protein pOV-2. - Yeast hypothetical protein YDR196C. - Caenorhabditis elegans hypothetical protein 
T05G5.5.These proteins all contain, in their N-terminal extremity, an ATP/GTP-bindtng motif TV (P-loop) (see 
<PDOC00017>). The size of these proteins range from 200 to 290 residues (with the exception of the Mycobacterial 
sequences which are are 410 residues long). A conseved region some 50 residues away from the ATP-binding P-loop 
has been developed as a signature pattern. 

Consensus pattern: G-x-[LI]-x-R-x{2)-L-x(4)-F-x(8HUV>x(5)-P-x-[LIV]-[ 1] Rudd K.E.. Bairoch A. Unpublished obser- 
vations (1997). 

[1604] 704. Ubiqurtin-conjugating enzymes active site 

Ubiquitin-conjugating enzymes (UBC or E2 enzymes) [1,2,3) catalyze the covalent attachment of ubiquitin to target 
proteins. An activatedubiquitin moiety is transferred from an ubiquitin-activating enzyme (E1 ) to E2which later ligates 
ubiquitin directly to substrate proteins with or without the assistance of 'N-end' recognizing proteins (E3). In most 
species there are many forms of UBC (at least 9 in yeast) which are implicated in diverse cellular functions. A cysteine 
residue is required for ubiquitin-thiolester formation. There is a single conserved cysteine in UBC's and the region 
around that residue isconserved in the sequence of known UBC isozymes. That region has been used as a signature 
pattern. 

Consensus pattern: [FYWLSP]-H-[PC]-[NH]-[LI V>x(3,4)-G-x-[LIV]-C-[UV]-x- [LIV] [C is the active site residue] 
[ 1] Jentsch S., Seufert W., Sommer T., Reins H.-A. Trends Biochem. Sci. 15:1 95-1 98(1 990). [ 2] Jentsch S., Seufert 
W. t HauserH.-P. Biochim. Bbphys. Acta 1089:1 27-1 39(1991 ).[ 3] Hershko A. Trends Biochem. Sci. 16:265-268(1991). 
[1605] 705. Uroporphyrinogen decarboxylase signatures 

U roporphyrinogen decarboxylase (URO-O), the fifth enzyme of the heme biosynthetic pathway, catalyzes the sequential 
decarboxylation of the four acetyl side chains of uroporphyrinogen to yield coproporphyrinogen [1].URO-D deficiency 
is responsible for the Human genetic diseases familialporphyria cutanea tarda (fPCT) and hepatoerythropoietic por- 
phyria (HEP).The sequence of URO-0 has been well conserved throughout evolution. The best conserved region is 
located in the N-terminal section; it contains a perfectlyconserved hexapeptide. There are two arginine residues in this 
hexapeptide which could be involved in the binding, via salt bridges, to the carboxylgroups of the propionate side chains 
of the substrate. This region has been used as a signature pattern. A second signature pattern is based on a another 
well conserved region which is located in the central section of the protein. 
Consensus pattern: P-x-W-x-M-R-Q-A-G-R 

Consensus pattern: G-F-[STAGCV]-[STAGC]-x-P-[FYW]-T-(LV]-x(2)-Y-x(2HAE]-[GK] 

[ 1) Garey J.R., Labbe-Bois R., Chetstowska A., Rytka J., Harrison L. Kushner J. ( Labbe R Eur J Biochem 205 
1011-1016(1992). 

[1606] 706. ubiE/COQ5 methy [transferase family signatures 

The following methyltransferases have been shown [1] to share regions of similarities: - Escherichia coli ubiE, which 
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6803h y po,he,ic*,p^^^ 

or seven transmembrane domains. A conseived re^^ sZt ^l ^S^ ? T^™ ,0 ^ S ' X 
fClow the second transmembrane domain has been J^^^ 1 ^ ,0direc <* 

Consensus pattern: E-[LIVM]-G-D-K-T-F-fLIVMFl(21-A- 



[ 11 Bairoch A. Unpublished observations (1996). 
[1592] 695. Uncharacterized protein family UPF0021 signature 



■ - h*wk7ii i icu i my urruui; i signature 

proteins MJ1157 and MJl478.These are proteins of hZ^mZ £ „ M Jf anococcus Jannaschii hypothetical 



1 1] Bairoch A. Unpublished observations (1997). 
[1 593] 696. Uncharacterized protein family UPF0023 signature 



- , a „ ,„ y urrw<: j signature 

anococcus jannaschii hySthetSprSS^ ^M^ITTT J ^ h *P° ,hetica| P ro ^ W06E11.4. -Meth- 
up in the database by ^*ST m " ** * °' ^ 30 Kd ^ 03,1 be 

f!f«] ^" sensus P atter ": D-x-D-E-{UV].L-x(4)-V-F-x(3)-S-K-G. 
P595J J1] Bairoch A. Unpublished observations (1997) 

Haernopi^ 

ical protein YOR243c. - Caenorhabditis e.egans hypothS^^ 

ical proteins MJ0588 and MJ1364 These are hvZnwr! . , . 24 - 11 - 'Methanococcus jannaschii hypothet- 
database by the following ptttem ***** °' ^ 39 ,0 77 Kd ^ «" * *** up in the 

[1597] Consensus pattern: G-x-K-O^RJ-x-A-fLVJ-T-x-Q-x^UVFJ-fSGCl- 
[ 1] Bairoch A. Unpublished observations (1997). 1 
[1598] 698. Uncharacterized protein family UPF0025 signature 

" S ^ ° f - «. hypo- 

ical proL MGM7 - SSS^SES^l^r^ ' 9 entta,ium — P~umonh. hypoS- 

M7r n ^^f^' D " V4U ^- X(2) ^- H ^- H - x ( 12 )*IVMF]-N-P-G ,n9Pat,em - 

i 'J Bairoch A. Unpublished observations (1997). 

[1599] 699. Uncharacterized protein family UPF0029 signature 

[1600] 700. Uncharacterized protein family UPF0030 signature 
YNL334C. - Bacillus subttts n^hTS 

anococcus jannaschii hyprthSSS^i^ " ' ^ h VP° the,ical P«>tein HI1648. - Meth- 

picked up inTe tatJ^^Z^™ are h ^ hi,fc of about 19 to 25 Kd. They can be 

Consensus pattern: {GA]-L-I-[UV]-P-G-G-e.S-T-[STA] 

[ 1] Bairoch A. Unpublished observations (1997). 

[1601] 701 . Uncharacterized protein family UPF0032 signature 

bacterium leprae protein. - Synechocyslis strain Per f5^Q k„~Lk r , U2126A. the corresponding Myco- 
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the sequence. A profile was also developed that spans the complete length of the ubiqurtin domain. 
Consensus pattern: K-x(2)-[LIVM]-x-[DESAK]-x(3^ 

[ 1] JentschS., Seufert W., HauserH.-P. Biochim. Biophys. Acta 1089:127-139(1991).[2]MoniaB.P., EckerD.J., Croke 
ST. Biotechnology 8:209-21 5(1 990).[ 3] Finley D., Varshavsky A. Trends Biochem. Sci. 10: 343-347(1 985). [ 4) Filippi 
s M., Tribioli C, Toniolo D. Genomics 7:453-457(1 990).{ 5] Olvera J., Wool I.G. J. Biol. Chem. 268:17967-17974(1993). 
1 6] Kumar S., Yoshida Y., Noda M. Biochem. Biophys. Res. Commun. 195:393-399(1 993).[ 7] Jones D. t Candido E. 
P. J. Biol. Chem. 268: 19545-1 9551 (1993).(8] Melnick L, Sherman F. J. Mot. Biol. 233:372-388(1993) 
{1611] 710. VHS domain 

(1612] Domain present in VPS-27, Hrs and STAM. Number of members: 27 

10 [1613] 711. Vinculin family signatures 

Vinculin [1 ] is a eukaryotic protein that seems to be involved in the attachment of the actin-based microfilaments to the 
plasma membrane. Vinculinis located at the cytoplasmic side of focal contacts or adhesion plaques. In addition to actin, 
vinculin interacts with other structural proteins such as talin and afpha-actinins. Vinculin is a large protein of 116 Kd 
(about a 1000 residues). Structurally the protein consists of an acidic N-terminal domain of about 90 Kd separated 

'5 from a basic C-terminal domain of about 25 Kd by a proline-rich region of about 50 residues. The central part of the 
N-terminal domain consists of avariable number (3 in vertebrates, 2 in Caenorhabditis elegans) of repeats of a 110 
amino acids domain. Catenins [2] are proteins that associate with the cytoplasmic domain of avariety of cadherins. 
The association of catenins to cadherins produces a complex which is linked to the actin filament network, and which 
seems to be of primary importance for cadherins cell-adhesion properties. Three different types of catenins seem to 

20 exist: alpha, beta, and gamma. Alpha-catenins are proteins of about 100 Kd which are evolutionary related to vinculin. 
Interm of their structure the most significant differences are the absence, inalpha-catenin, of the repeated domain and 
of the proline-rich segment. Two signature patterns for this family of proteins have been devolped. The first pattern is 
located in the N-terminal section of both vinculin and alpha-catenins and is part, in vinculin, of a domain that seems to 
be involved with the interaction with talin. The second pattern is based on a conserved regionin the N-terminal part of 

25 the repeated domain of vinculin. 

Consensus pattern: [KR]-x-[LIVMF]-x(3)-[LIVMA]-x(2)-[LIVM]-x(6)-R-Q-Q-E-L Consensus pattern: [LIVM]-x-[QA]-A-x 
(2)-W-[IL]-x-[DN]-P 

[ 1] Otto J.J. Cell Motil. Cytoskeleton 16:1 -6(1 990).[ 2] Herrenknecht K., Ozawa M., Eckerskom C, Lottspeich R, Lenter 
M., Kemler R. Proc. Natl. Acad. Sci. U.S.A. 88:9156-9160(1991). 
30 [1614] 712. (Vitellogenin N) Lipoprotein amino terminal region 

[1 61 5] This family contains regions from: Vitellogenin, Microsomal triglyceride transfer protein and apolipoprotein B- 
100. These proteins are all involved in lipid transport [1]. This family contains the LV1n chain from lipovitellin, that 
contains two structural domains. Number of members: 33 

[1616] [1] The structural basis of lipid interactions in lipovitellin, a soluble lipoprotein. Anderson TA, Levitt DG, Ba- 
35 naszak LJ Structure 1998;6:895-909. 

[1617] 713. (VMS A) Major surface antigen from hepadnavirus 

[1618] 714. ssDNA binding protein (Viral DNA bp) 

This protein is found in herpesviruses and is needed for replication. 

[1 61 9] 71 5. (Votage CLC) Voltage gated chloride channels 
40 [1620] This family of ion channels contains 10 or 12 transmembrane helices. Each protein forms a single pore. It 

has been shown that some members of this family form homodimers. These proteins contain two CBS domains. 

[1] Schmidt-Rose T, Jentsch TJ; J Biol Chem 1997;272:20515-20521. 

[2] Zhang J, George AL Jr, Griggs RC, Fouad GT, Roberts J, Kwiecinski H, Connolly AM, Ptacek U Neurology 
45 1996;47:993-998. 

[1621] 716. von Willebrand factor type A domain (vwa) 

More von Willebrand factor type A domains? Sequence similarities with malaria thrombospondin -related anonymous 
protein, dihydropyridine-sensitive calcium channel and inter-alpha-trypsin inhibitor. 
50 Bork P, Rohde K; 

Biochem J 1 991 ;279:908-91 1 . 

1. RUGGERI, Z.M. and WARE, J. 
von Willebrand factor. 

55 FASEB J. 7 308-316 (1993). 

2. COLOMBATTI, A., BONALDO, P. and DOLIANA. R. 

Type A modules: interacting domains found in several non-fibrillar collagens and in other extracellular ^matrix pro- 
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is involved in both ubiquinone and menaquinone biosynthesis and which catalvzes tho c^h^w.™ ^ 

^^^^^■^^^^^^^ 

They can be peked up in the database by the following patterns 5Kd) 
Consensus pattern: Y-D-x-M-N-x(2)-[UVM]-S-x(3)-H-x(2)-W 
Consensus pattern: R-V-[UVM]-K-[PVJ-G-G-x4LIVMF]-x(2)-ILIVM]-E-x-S 

JL L r RT - HSU A - Y - Ha HT - Clarke C R J Bacter » l 179:1748-1754(1997) 
[1607] 707. Uricase signature 

Consensus pattern: [LVhx-[LV]-[UV]-K-{STV]-[STl-x-[SNJ- X -F-x(2HF^x(4)- [FY]-x(2)-L-xf5)-R 
1]l Motopma K.. Kanaya S., Goto S. J. Biol. Chem. 263.16677-16681(1988) ^ 
[1608] 708. Universal stress protein family (Usp) 

ffisss sssr^rsss;'-* ^ 01 Escheftihs - — — ■ * 

[1610] 709. Ubiquitin domain signature and profile 

ATP^ependent selective degra^ion of ceNufarp'S 
to tai. repeats of ubiqufth^^^^ 

proteins consisting of a single copy of ubiquruMused to a C4«lin!f ^ °' 9 ° neS pr0dUces precursor 

relatedtoubiquftin -UbS 

elegans hypothetical prtfeTpsS 3 T^J ™? SchH t osac, * arom y c «s pombe protein alpH and Caenorhabditis 
ov'dcmi - ScSSro^^^ - • CAP- 

ubiquitin domain - Yeast orotein SMTa H-J-dTv . S PAC26A3.16. This protein contains a terminal 
protein SMT3C (a IsotSwT* Jbh sZ- r «'"n-"Ke protems SMT3A and SMT3B. - Human ubiquitin-iike 
to me nuclear pSTeSitSE ^1^^ ^ ^ " h ^etingranGAPl 
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amino acid residues. Structurally G-beta consists ol eight tandem repeats ot about 40 residues, each containing a 
central Trp-Asp motif (this type of repeat is sometimes called a WD-40 repeat). Such a repetitive segment has been 
shown [E1 ,2,3,4,5] to exist in a number of other proteins listed below: 

Yeast STE4, a component of the pheromone response pathway. STE4 is a G-beta like protein that associates with 
GPA1 (G-alpha) and STE18 (G-gamma). 

- Yeast MSM , a negative regulator of RAS-mediated cAMP synthesis. MSI t is most probably also a G-beta protein. 
Human and chicken protein 12.3. The function of this protein is not known, but on the basis of its similarity to G- 
beta proteins, it may also function in signal transduction. 

Chlamydomonas reinhardtii gblp. This protein is most probably the homolog of vertebrate protein 12.3. 
Human LIS1, a neuronal protein involved in type-1 lissencephaly [E2]. 

Mammalian coatomer beta' subunit (beta*-COP), a component of a cytosolic protein complex that reversibly as- 
sociates with Golgi membranes to form vesicles that mediate biosynthetic protein transport. 

Yeast CDC4. essential for initiation of DNA replication and separation of the spindle pole bodies to form the poles 
of the mitotic spindle. 

- Yeast COC20, a protein required for two microtubule-dependent processes: nuclear movements prior to anaphase 
and chromosome separation. 

Yeast MAK11 , essential for cell growth and for the replication of M1 double-stranded RNA. 

- Yeast PRP4, a component of the U4/U6 small nuclear ribonucleoprotein with a probable role in mRNA splicing. 
Yeast PWP1 , a protein of unknown function. 

Yeast SKI8, a protein essential for controlling the propagation of double-stranded RNA. 

Yeast SOF1 , a protein required for ribosomal RNA processing which associates with U3 small nucleolar RNA. 

- Yeast TUP1 (also known as AER2 or SFL2 or CYC9), a protein which has been implicated in dTMP uptake, cat- 
abolrte repression, mating sterility, and many other phenotypes. 

Yeast YCR57c, an ORF of unknown function from chromosome III. 
Yeast YCR72c, an ORF of unknown function from chromosome III. 

Slime mold coronin, an actin-binding protein. 

Slime mold AAC3, a developmental^ regulated protein of unknown function. 

- Drosophila protein Groucho (formerly known as E(spl); 'enhancer of split 1 ), a protein involved in neurogenesis and 
that seems to interact with the Notch and Delta proteins. 

Drosophila TAF-ll-80, a protein that is tightly associated with TFIID. 

[1626] The number of repeats in the above proteins varies between 5 (PRP4, TUP1 , and Groucho) and 8 (G-beta, 
STE4, MSI1, AAC3, CDC4, PWP1, etc.). In G-beta and G-beta like proteins, the repeats span the entire length of the 
sequence, while in other proteins, they make up the N-terminal, the central or the C-terminal section. 
[1627] A signature pattern can be developed from the central core of the domain (positions 9 to 23). 

- Consensus pattern: [UVMSTACl-[LIVMFYWSTAGC]-[LIMSTAG]-[UVMSTAGC]-x(2)-[DN]- 
x(2)-[LIVMWSTAC]-x-[UVMFSTAG]-W-[DEN]-[LIVMFSTAGCN] 

[1]Gilman A.G. 

Annu. Rev. Bkxhem. 56:615-649(1987). 
[ 2) Duronio R.J., Gordon J.I., Boguski M.S. 
Proteins 13:41-56(1992). 
[ 3] van der Voorn L, Ploegh H.L 
FEBS Lett. 307:131 134(1992). 

[ 4] Neer E.J., Schmidt C.J., Nambudripad R., Smith TF 
Nature 371:297-300(1994). 
[ 5] Smith TF, Gaiatzes C.G., Saxena K., Neer E J. 
Biochemistry In Press(1998). 



[1628] 718. WHEP-TRS domain containing proteins 

A conserved domain of 46 amino acids has been shown [1 ] to exist in a number of higher eukaryote aminoacykransf er 
RNA synthetases. This domain is present one to six times in the following enzymes: 
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teins. 

MATRIX 13 297-306 (1993). 



^PERKINS. S.J.. SMITH, K.F., WILLIAMS, S.C.. MARIS, P.I., CHAPMAN, D and SIM R B 
^ns^on^i^aredsp^^^copy' 0 ' 1 ^'" e k fand * ac ' or 'VP 8 ^ domain in factor 8 of human complement by Fourier 
..s^urrenoe h co.fcgen types VI, VII, ». and XIV. the integrins and other proteins by averaged structure pre- 

J.MOLBIOL 238 104-119 (1994). 

4. BORK, R and ROHOE, K 

5. EDWARDS, Y.JX and PERKINS S J 
FEBS LETT. 358 283-286 (1995). 

6. LEE, J.O., RIEU, P., ARNAOUT. M.A. and UDDINGTON R 

CE2^3^;f9^. d0main ' r0m ^ a ' Pha SUbUni ' CR3 (CDHb/0018). 

7. QU, A. and LEAHY, D J 

P?ct N S 

integrins (.KJomains); collagen types VI VH Si^dETv, Actors B, C2, CR3 and CR4; the 

vWFdo m ainsp artfc ipa,einnuILsbio^ 

signal transduction), involvinq interaction with * ' c 7 i adnes,on - "Ration, hommg, pattern formation, and 
aligned vWF sequences ha! ^eveS aTarol !£ 9 , Y °' W SeCOndary stmcture Potion from 75 

ognition algorithms were usedTs^e sequ^^^^ alpha^e.ices and beta-strands f3]. FoW rec- 

was predicted to be a doub,y-w<>uT^^^^ ^ , ^7 M ™ "» vWF *»■*• ™ 

determined for the domains of integriS CDlT^ £?nH 1 I ^ha-helices 15]. 30 structures have been 
The domain adopts a classic a lp ha?Sa FWa^ 

surface. ., has been suggested thauS m^^Tt^™",?^ ^ ^ at te 

binding protein ligands [6J. The reW* c^sE T„ iSk '^P 6 "^ ad "<*™ *rte (MIDAS) for 

residues invoked in metal ion cxwrdinaLri^aTs rATo nclude « the first beta^trand and 3 conserved 

The anctent regulatory-protein family of WD-repeat proteins 
Neer EJ, Schmidt CJ, Nambudripad R f Smith TF* 
Nature 1 994;371 :297-300 

P-^^^ r T ^ " ^ —» -leot^g 

W The alpha subunit binds to and ~P«« 
they seem to be required for the repLement SwifSEl ^ESST* " h " bUt 

ognition, y ' r as we " as ,or membrane anchoring and receptor rec- 

[1625] ,n higher eukaryotes G-beta exists as a small multigene famiry of highly conserved proteins of about 340 
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of twomain subsets: - Subset 1 , to which belongs XPG, RAD2 from budding yeast and radI3 from fission yeast. RAD2 
and XPG are single-stranded DNA endonucleases [7,8]. XPG makes the 3* incision in human DNA nucleotide excision 
repair (9). - Subset 2, to which belongs mouse and human FEN-1, rad2 from fission yeast, and RAD27 from budding 
yeast. FEN-1 is a structure-specific endonuclease. In addition to the proteins listed in the above groups, this family 
also includes: - Fission yeast exol, a 5'->3' double-stranded DNA exonuclease that could act in a pathway that corrects 
mismatched base pairs. - Yeast EXOI (DHS1), a protein with probably the same function as exol. - Yeast DlN7.Se- 
quence alignment of this family of proteins reveals that similarities are largely confined to two regions. The first is 
located at the N-terminal extremity (N-region) and corresponds to the first 95 to 105 amino acids. The second region 
is internal (l-region) and found towards the C-terminus; it spans about 140 residues and contains a highly conserved 
core of 27 amino acids that includes a conserved pentapeptide (E-A-[DE]-A-[QS]). It is possible that the conserved 
acidic residues are involved in the catalytic mechanism of DNA excision repair in XPG. The amino acids linking the N- 
and I -regions are not conserved; indeed, they are largely absent from proteins belonging to the second subset. Two 
signature patterns have been developed for these proteins. The first corresponds to the central part of the N-region, 
the second to part of the l-region and includes the putative catalytic core pentapeptide 
[1636] Consensus pattern: [VI]-fKRE]-P-x-[FYIL]-V-F-D-G-x(2)-[PILl-x-[LVC]-K- 
Consensus pattern: [GS]-{UVM]-[PER]-[FYSI-(LIVM]-x-A-P-x-E-A-[DE]-[PAS]- [QS]-[CLM]- 

[1637] [ 1] Tanaka K., Wood R.D. Trends Biochem. Sci. 19:83-86(1994).! 2] Scheriy D., Nouspikel T, Corlet J., Ucla 
C, Bairoch A., Clarkson S.G. Nature 363: 182-1 85(1 993).[ 3] Carr AM, Sheldrick K.S., Murray J.M., Al-Harithy R., 
Watts F.Z., Lehmann A.R. Nucleic Acids Res. 21: 1345-1 349(1 993). [ 4] Murray J.M., Tavassoli M., Al-Harithy R., Shel- 
drick K.S., Lehmann A.R., Carr A.M., Watts F.Z. Mol. Cell. Biol. 14:4878-4888(1 994). [ 5] Harrington J. J., Lieber M.R. 
Genes Dev. 8: 1344-1 355(1 994). [ 6] Szankasi P., Smith G.R. Science 267:1166-1 169(1995).[ 7) Habraken Y, Sung P., 
Prakash L. Prakash S. Nature 366:365-368(1 993). [ 8] O'Donovan A.. Scheriy D., Clarkson S.G.. Wood R.D. J. Biol. 
Chem. 269: 15965-1 5968(1 994).[ 9] O'Donovan A., Davies A.A., Moggs J.G., West S.C., Wood R.D. Nature 371: 
432-435(1994). 

[1638] 722. Xanthine/uracil permeases family 

The following transport proteins which are involved in the uptake of xanthine or uracil are evolutionary related [1]: 

Uric uric acid-xanthine permease (gene uapA) from Aspergillus nidulans. 
Purine permease (gene uapC) from Aspergillus nidulans. 
Xanthine permease from Bacillus subtilis (gene pbuX). 

- Uracil permease from Escherichia coli (gene uraA) [2] and Bacillus (gene pyrP). 

- Hypothetical protein ycdG from Escherichia coli. 
Hypothetical protein ygfO from Escherichia coli. 
Hypothetical protein ygf U from Escherichia coli. 
Hypothetical protein yicE from Escherichia coli. 
Hypothetical protein yunJ from Bacillus subtilis. 
Hypothetical protein yunK from Bacillus subtilis. 

[1639] They are proteins of from 430 to 595 residues that seem to contain 12 transmembrane domains. 

The best conserved region which corresponds with what seems to be the tenth transmembrane domain has been 

selected as a signature pattern. 

- Consensus pattern: [LIVM]-P-x-[PASIF]-V-[LIVM]-G-G-x(4)-[LIVM]-[FYHGSA]-x-[LIVM]-x(3)-G 

[ 1] Diallinas G. ( Gorfinkiel L, Arst G. ( Cecchetto G., Scazzocchio C. J. Biol. Chem. 270:8610^8622(1995). 
[ 2) Andersen P.S., Frees D., Fast R., Mygind B. J. Bacterid. 177:2008-2013(1995). 

[1 640] 723. Hypothetical yabO/yceC/sfhB family 

The following proteins, which seems to belong to a family of pseudouridine synthases (EC 4.2.1.70) [1] have been 
shown to share regions of similarities: 

Escherichia coli and Haemophilus influenzae ribosomal large subunit pseudouridine synthase A (gene riuA). It is 
responsible for synthesis of pseudouridine from uracil-746 IN 23S rRNA. 

Escherichia coli and Haemophilus influenzae ribosomal large subunit pseudouridine synthase C (gene riuC). It is 
responsible for synthesis of pseudouridine from uracil at positions 955. 2504 and 2580 in 23S rRNA. 
Escherichia coli protein and homotogs in other bacteria large subunit pseudouridine synthase D (gene riuD). 
Yeast DRAP deaminase (gene RIB2). 

- Escherichia coli hypothetical protein yqcB and H1 1435, the corresponding Haemophilus influenzae protein. 
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- D'osop^amuMun^ 

- Mamrna,*n, „ sect , nema(odeandp|ant g , cykRNAsynthetase .e^S^^^ 

- Mammalian histidyMRNA synthetase. The domain is found at the N-terminal extremity. 

[1630] A signature pattern based on the first 29 positions of the WHEP-Domain has been developed. 

- Consensus pattern: [QY]^4DNEA]-x^LIV]^KR]-x(2)-K-x(2)4KRNGJ-[AS]-x(4)^UV]-[DENK]-x(2)-IIV]-x(2)-L-x 

J. Biol. Chem. 268:7660-7667(1993). 

[1631] 719. (Worm family 6) Putative membrane protein 
Analysis of protein domain families in Caenorhabditis eieqans 
Sonnhammer EL, Durbin R; 
Genomics 1997;46:200-216. 

This family called family 8 in [1], may be a transmembrane protein 
The specific function of this protein is unknown 
[1632] 720. Xylose isomerase 

to require rragnesiurnSa^ 

number of resLes 7^^^^ ** « «*"»■ A 

KS SoZreT^nf ^ " h " B * * h""*-* and is manganese^ependent 

[E is a magnesium ligand] 
[K is an active site residue] 

- Consensus pattern: [FLJ-H-D-x-D-[UV]-x-[PDJ-x-[GDE] 
[H is an active site residue] 

f 1] Dauter Z., Dauter M., Hemker J., Witzel H Wilson K S 
FEBS Lett. 247:1-8(1989). 

[2] Kristo PA, Saarelainen R, Fagerstrom R. ( Aho S , Korhola M 
Eur J. Biochem. 237:240-246(1 996). 
[ 3] Henrick K„ Collyer CA, Blow D.M. 
J. Mol. Biol. 208: 1 29-1 57(1 989). 

[ 4] Vangrysperre W, Ampe C, Kersters-Hilderson H Tempst P. 
Biochem. J. 263: 1 95-1 99(1 989). 

p caned XPG (or XPGC) [2].XPG belongs to a famify of proteins [2.3.4.5,6] that are oompoid 
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Arg and Lys. 

- Carboxypeptidase N (EC 3.4.17.3) (also known as arginine carboxypeptidase), a plasma enzyme which protects 
the body from potent vasoactive and inflammatory peptides containing C-terminal Arg or Lys (such as kinins or 
anaphylatoxins) which are released into the circulation. 

- Carboxypeptidase H (EC 3.4.17.10) (also known as enkephalin convertase or carboxypeptidase E), an enzyme 
located in secretory granules of pancreatic islets, adrenal gland, pituitary and brain. This enzyme removes residual 
C-terminal Arg or Lys remaining after initial endoprotease cleavage during prohormone processing. 

- Carboxypeptidase M (EC 3.4.17.12), a membrane bound Arg and Lys specific enzyme. 

It is ideally situated to act on peptide hormones at local tissue sites where it could control their activity before or after 
interaction with specific plasma membrane receptors. 

- Mast cell carboxypeptidase (EC 3.4.17.1), an enzyme with a specificity to carboxypeptidase A, but found in the 
secretory granules of mast cells. 

- Streptomyces griseus carboxypeptidase (Cpase SG) (EC 3.4.1 7.-) [3], which combines the specificities of mam- 
malian carboxypeptidases A and B. 

- Thermoactinomyces vulgaris carboxypeptidase T (EC 3.4. 1 7. 18) (CPT) [4], which also combines the specificities 
of carboxypeptidases A and B. 

- AEBP1 [5], a transcriptional repressor active in preadipocytes. AEBP1 seems to regulate transcription by cleavage 
of other transcriptional proteins. 

- Yeast hypothetical protein YHR1 32c. 

[1648] All of these enzymes bind an atom of zinc. Three conserved residues are implicated in the binding of the zinc 
atom: two histidines and a glutamic acid Two signature patterns which contain these three zinc-ligands have been 
derived. 

- Consensus pattern: [PK]-x-[LIVMFY]-x-[LI VMFY]-x(4)-H.[STAG]-x-E-x-[LIVM]-[STAG]-x(6)-[LIVMFYTA] [H and E 
are zinc ligands] 

- Consensus pattern: H-[STAG]-x(3)-[LIVME]-x(2)-[LIVMFYW]-P-[FYW] [H is a zinc ligand] 

[ 1] Tan F. t Chan S.J., Steiner D.F., Schilling J.W., Skidgel R.A 
J. Biol. Chem. 264:13165-13170(1989). 

[ 2] Reynolds D.S., Stevens R.L., Gurley D.S., Lane W.S., Austen K.F., 
Serafin W.E. 

J. Biol. Chem. 264:20094-20099(1989). 

[ 3] Narahashi Y. 

J. Biochem. 107:879-886(1990). 

( 4] Teplyakov A. t Polyakov K. t Obmobva G., Strokopytov B., Kuranova I., 
Osterman A.L, Grishin N.V., Smutevitch S.V., ZagnitkoO.P., 
Galperina CXV., Matz M.V, Stepanov V.M. 
Eur. J. Biochem. 208:281-288(1992). 
{ 5] He G.-R, Muise A., Li A.W., Ro H.-S. 
Nature 378:92-96(1995). 

[ 6] Hourdou M.-L, Guinand M., Vacheron M.J., Michel G., Denoroy L., 
Duez CM., Englebert S., Joris B., Weber G., Ghuysen J.-M. 
Biochem. J. 292:563-570(1993). 
[ 7] Rawlings N.D., Barrett A.J. 
Meth. Enzymol. 248:183-228(1995). 

[1649] 726. Zinc finger, C2H2 type 

The C2H2 zinc finger is the classical zinc finger domain. 

The two conserved cysteines and histidines coordinate a zinc ion. The following pattern describes the zinc finaer. 
#-X-C-X(1-5)-C-X3-#-X5-#-X2-H-X(3^)-[H/CJ 

Where X can be any amino acid, and numbers in brackets indicate the number of residues. The positions marked # 
are those that are important for the stable fold of the zinc finger. The final position can be either his or cys. 
The C2H2 zinc finger is composed of two short beta strands followed by an alpha helix. The amino terminal part of the 
helix binds the major groove in ONA binding zinc fingers. 

[1650] 'Zinc finger 1 domains [1 -5] are nucleic acid-binding protein structures first identified in the Xenopus transcrip- 
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- Haemophilus influenzae hypothetical protein HI0042 

- Aquifex aeolicus hypothetical protein AQ_1 759. 

- Bacillus subtilis hypothetical protein yhcT. 

- Bacillus subtilis hypothetical protein yjbO. 

- Bacillus subtilis hypothetical protein ylyB. 

- Helicobacter pylori hypothetical protein HP0347. 

- Helicobacter pylori hypothetical protein HP0745. 

- Helicobacter pylori hypothetical protein HP0956. 

- Mycoplasma genitalium hypothetical protein MG209 

- Mycoplasma genitalium hypothetical protein MG370 

- Synechccystis strain PCC 6803 hypothetical protein slr1592 

- Synechccystis strain PCC 6803 hypothetical protein sir1629 

- Yeast hypothetical protein YDL036c. 

- Yeast hypothetical protein YGR1 69c. 

- Fission yeast hypothetical protein S P AC18B11 02c 

- Caenorhabdrtis elegans hypothetical protein K07E8.7. 

(EC 4 2 , .70, „, ^Vwn to ££££2 SZL'*" " ° *™> « ***** 

- Aquifex aeolicus hypothetical protein AQ_1 464 

- Bacillus subtilis hypothetical protein ypuL 

- Bacillus subtilis hypothetical protein ytzR 

- Borrelia burgdorferi hypothetical protein BB01 29 

- Helicobacter pylori hypothetical protein HP1 459 

- Synechccystis strain PCC 6803 hypothetical protein slr0361 

- Synechccystis strain PCC 6803 hypothetical protein slr0612.' 

• Con» raU «p.„ s „ G.R- L .D-,C(2HST^«^ U VFAHUVMF10HSTHDNST1 
[1645] [ 1] Wrzesinski J., Bakin A Nurse K 1 flno or o» . ~ 

H646] 724. Zinc finger present ^^5^ ^ *~^™»™Wm. 

2Z in dystrophin binds calmodulin 
Putative zinc finger; binding not yet shown. 
[1 647] 725. Zinc carboxypepticiase 
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after the second cysteine; it is generally an aromatic or aliphatic residue. 

- Consensus pattern: C-x(2,4)-C-x(3)-[UVMFYWC]-x(8)-H-x(3,5)-H [The two C's and two Ks are zinc ligands) 

[ 1]Klug A., Rhodes D. 

Trends Biochem. Sci. 12:464-469(1987). 

[ 2] Evans R.M., Hollenberg S.M. 

Cell 52:1-3(1988). 

{ 3] Payre F., Vincent A. 

FEBS Lett. 234:245-250(1988). 

[ 4] Miller J., McLachlan A.D.. Klug A. 

EMBO J. 4: 1 609-1 61 4(1 985). 

[ 5] Berg J.M. 

Proc. Natl. Acad. Sci. U.S.A. 85:99-102(1988). 

( 6] Rosenfeld R.. Margalit H. 

J. Biomol. Struct. Dyn. 11:557-570(1993). 

[1 654] 727. Zinc finger, C3HC4 type (RING finger) 

A number of eukaryotic and viral proteins contain a conserved cysteine-rich domain of 40 to 60 residues (called C3HC4 
zinc-finger or 'RINGTinger) [1] that binds two atoms of zinc, and is probably involved in mediating protein-protein in- 
teractions. The 3D structure of the zinc ligation system is unique to the RING domain and is refered to as the 'cross- 
brace' motif. The spacing of the cysteines in such a domain is C-x(2)-C-x{9 to 39)-C-x(1 to 3)-H-x(2 to 3)-C-x(2)-C-x 
(4 to 48)-C-x(2)-C. 

[1655] Proteins currently known to include the C3HC4 domain are listed below (references are only provided for 
recently determined sequences). 

- Mammalian V(D)J recombination activating protein (gene RAG1). RAG1 activates the rearrangement of immu- 
noglobulin and T-cell receptor genes. 

Mouse rpt-1 . Rpt-1 is a trans-acting factor that regulates gene expression directed by the promoter region of the 
interleukin-2 receptor alpha chain or the LTR promoter region of HIV-1 . 

- Human rfp. Rfp is a developmental^ regulated protein that may function in male germ cell development. Recom- 
bination of the N-terminal section of rfp with a protein tyrosine kinase produces the ret transforming protein. 
Human 52 Kd Ro/SS-A protein. A protein of unknown function from the Ro/SS-A ribonucleoprotein complex. Sera 
from patients with systemic lupus erythematosus or primary Sjogren's syndrome often contain antibodies that react 
with the Ro proteins. 

Human histocompatibility locus protein RING1 . 

- Human PML, a probable transcription factor. Chromosomal translocation of PML with retinoic receptor alpha cre- 
ates a fusion protein which is the cause of acute promyelocytic leukemia (APL). 

Mammalian breast cancer type 1 susceptibility protein (BRCA1 ) [E1 ]. 
Mammalian cbl proto-oncogene. 
Mammalian bmi-1 proto-oncogene. 

- Vertebrate CDK-activating kinase (CAK) assembly factor MAT1 , a protein that stabilizes the complex between the 
CDK7 kinase and cyclin H (MAT1 stands for 'Menage A Trois.'). 

- Mammalian mel-18 protein. Mel-18 which is expressed in a variety of tumor cells is a transcriptional repressor that 
recognizes and bind a specific DNA sequence. 

- Mammalian peroxisome assembly factor-1 (PAF-1) (PMP35), which is somewhat involved in the biogenesis of 
peroxisomes. In humans, defects in PAF-1 are responsible for a form of Zellweger syndrome, an autosomal re- 
cessive disorder associated with peroxisomal deficiencies. 

Human MAT1 protein, which interacts with the CDK7-cyclin H complex. 
Human RING1 protein. 

Xenopus XNF7 protein, a probable transcription factor. 

- Trypanosoma protein ESAG-8 (T-LR), which may be involved in the postranscriptional regulation of genes in VSG 
expression sites or may interact with adenylate cyclase to regulate its activity. 

- Drosophila proteins Posterior Sex Combs (Psc) and Suppressor two of zeste (Su(z)2). The two proteins belong 
to the Polycomb group of genes needed to maintain the segment-specific repression of homeotic selector genes. 

- Drosophita protein male-specific msl-2. a DNA-binding protein which is involved in X chromosome dosage com- 
pensation (the elevation of transcription of the male single X chromosome). 

Arabidopsis thaHana protein COP1 which is involved in the regulation of photomorphogenesis. 
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have demonstrated the^inc^n^ 

[1652] Some of the proteins Enown ^o include C2H^Z ^ * W8Cta * t 

regionsfoundineachofmeseproteins^dSe^ 

data* available and that additional finger domains m^pS^ ^^''"^^^^^^ 

. - Emencella niduians: brlA (2) creA (2) 1 ' 1 (2) 
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[ 1 ] Aitken A. 

Trends Biochem. Sci. 20:95-97(1995). 

[ 2] Morrison D. 

Science 266:56-57(1994). 

[ 3) Xiao B., Smerdon S.J., Jones D.H., Dodson G.G., Soneji Y, Aitken A., Gamblin S.J. 
Nature 376:188-191(1995). 

[1670] 733. D-isomer specific 2-hydroxyacid dehydrogenases (2 Hacid DH) 

This Pfam covers the Formate dehydrogenase, D-glycerate dehydrogenase and D-lactate dehydrogenase families in 
SCOP. A number of NAD-dependent 2-hydroxyacid dehydrogenases which seem to be specific for the O-isomer of 
their substrate have been shown [1 ,2,3,4] to be functionally and structurally related. These enzymes are listed below. 

- D-lactate dehydrogenase (EC 1 . 1 . 1 .28), a bacterial enzyme which catalyzes the reduction of D-lactate to pyruvate. 

- D-glycerate dehydrogenase (EC 1.1.1.29) (NADH-dependent hydroxypyruvate reductase), a plant leaf peroxiso- 
mal enzyme that catalyzes the reduction of hydroxypyruvate to glycerate. This reaction is part of the glycolate 
pathway of photorespiration, 

- D-glycerate dehydrogenase from the bacteria Hyphomicrobium methylovorum and Methylobacterium extorquens. 

- 3-phosphoglycerate dehydrogenase (EC 1 . 1 . 1 .95), a bacterial enzyme that catalyzes the oxidation of D-3-phos- 
phoglycerate to 3-phosphohydroxypyruvate. This reaction is the first committed step in the 'phosphorylated' path- 
way of serine biosynthesis. 

- Erythronate-4-phosphate dehydrogenase (EC 1 . 1 . 1 .-) (gene pdxB), a bacterial enzyme involved in the biosynthesis 
of pyridoxine (vitamin B6). 

- D-2-hydroxyisocaproate dehydrogenase (EC 1.1.1.-) (D-hicDH), a bacterial enzyme that catalyzes the reversible 
and stereospecific interconversion between 2-ketocarboxylic acids and D-2-hydroxy-carboxylic acids. 

- Formate dehydrogenase (EC 1 .2.1.2) (FDH) from the bacteria Pseudomonas sp. 101 and various fungi [5]. 

- Vancomycin resistance protein vanH from Enterococcus faecium; this protein is a D-specific alpha-keto acid de- 
hydrogenase involved in the formation of a peptidoglycan which does not terminate by D^alanine thus preventing 
vancomycin binding. 

Escherichia coli hypothetical protein ycdW. 
Escherichia coli hypothetical protein yiaE. 
Haemophilus influenzae hypothetical protein HI1556. 
Yeast hypothetical protein YER081 w. 
Yeast hypothetical protein YIL074W. 

[1671] All these enzymes have similar enzymatic activities and are structurally related. Three of the most conserved 
regions of these proteins have been selected to develop patterns. The first pattern is based on a glycine-rich region 
located in the central section of these enzymes; this region probably corresponds to the NAD-binding domain. The two 
other patterns contain a number of conserved charged residues, some of which may play a role in the catalytic mech- 
anism. 

- Consensus pattern: [LIVMAHAG]-[IVTHUVMFYH^ 
wCTH]-[DNSTK] 

- Consensus pattern: [UVMFYWA]-{LIVFYWC]-x(2)-[SAC]-[DNQHR]-[IWA]-fUVn-x-[UVF]-[HNI]-x-P-x(4)-[STN]- 
x(2)-[UVMFJ-x-[GSDN] 

■ Consensus pattern: [LMFATC]-[KPQ]-x-{GSTDN]-x-[LIVMFYWRHLIVMFYVvl(2)-N-x-[STAGCl-R-[GP]-x-[LIVH]- 
[LIVMC]-[DNV] 

[1] Grant G.A_ Biochem. Biophys. Res. Commun. 165:1371-1374(1989). 

[2] Kochhar S. f Hunziker P., Leong-Morgenthaler P.M., Hottinger H. Biochem. Biophys. Res Commun 184 60-66 
(1992). 

[3] Ohta T., Taguchi H. J. Biol. Chem. 266:12588-12594(1991). 

[4] Goldberg J.D., Yoshida T, Brick P. J. Mot. Biol. 236:1123-1140(1994). 

[5] Popov V.O. Lamzin V.S. Biochem. J. 301:625-643(1994). 

[1672] 734. 2-oxo acid dehydrogenases acyltransf erase (catalytic domain) 

Refined crystal structure of the catalytic domain of dihysrolipoyl transacetylase (E2P) from azotobacter vineelandii at 
2.6 angstroms resolution. 

Mattevi A, Obmotova G, Kalk KH, Westphal AH, De Kok A, Hoi WG; 
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- Fungal DNA repair proteins RA05, RAD1 6, RAD 18 and rad& 

- Herpesviruses transacting transcriptional protein ICPQ/IE11 0 Thi<5 nrntoin u/h^ ^ k u 

- Baculoviruses major immediate early protein (PE-38). 

- Baculoviruses immediate-early regulatory protein IE-N/IE-2 

- Caenorhabditis elegans hypothetical proteins F54G8.4, R05D3 4 and T02C1 1 

- Yeast hypothetical proteins YER1 16c and YKR01 7c. 

[1656] The central region of the domain was selected as a signature pattern for the C3HC4 finger. 

- Consensus pattern: C-x-H-x-[LIVMFY]-C-x(2)-C-[LIVMYA] 

[1657] [ 1] Borden K.LB., Freemont P.S. 

Curr. Opin. Struct. Biol. 6:395-401(1996). 

[1 658] 728. Zinc finger C-x8-C-x5-C-x3-H type (and similar) 

[1 659] 729. Zinc finger, CCHC class 

ilr^ f CCHC ?° Tm9erSt ***** ff0m retrOViral 9a 9 P roteins (nucleocapsid). Prototype structure is from HIV 
Also contains members .nvolved in eukaryotic gene regulation, such as C. elegans GLhT 

fTr^ a ?; r6S,dUe 2inC fin9ef: 00 exam P ,es of indels in ^ alignment. 
[1660] 730. Zn-fmger in Ran binding protein and others 

[1561] 731. AN1 -like Zinc finger 

and numbers in brackets indicate the number of residues ^ X ^ be a " y am ' n ° aCid ' 

[1663] [i] Linnen JM, Bailey CP, Weeks DL; Gene 1993-128-181-188 
[1664] 732. 14-3-3 proteins - 

Structure of a 14-3-3 protein and implications for coordination of multiple signalling pathways 

Liu D, Bienkowska J. Petosa C. Collier RJ. Fu H. Liddington R 
Nature 1995;376:191-194 

E «"rSS !£UK£2? ~- ls M °> " - -»—" <* 

Cell 1996;84:889-897 

[1 667] Molecular evolution of the 1 4-3-3 protein familv. 

Wang W, Shakes DC 

J Mol Evoi 1996;43:384-398. 

Function of 14-3-3 proteins. 

Jin DY, Lyu MS, Kozak CA, Jeang KT 

Nature 1996;382:308-308 

- Consensus pattern: R-N-L-[LIV)-S-{VG1-{GA]-Y-{KN]-N-[IVA] 

- Consensus pattern: Y-K-fDEl-S-T-U-f.MJ^-u^^HC^-N^-T^ShW-fTANJ-ISADJ 
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[1 680] Prokaryotic and eukaryotic 6PGD are proteins of about 470 amino acids whose sequence are highly conserved 
[1 J. A region which has been shown [2], from studies of the sheep 6PGD tertiary structure, to be involved in the binding 
of 6-phosphogluconate has been selected as a signature pattern. 

- Consensus pattern: [LIVM]-x-D-x(2)-[GA]-[NQS)-K-G-T-G-x-W 

{ 1 ] Reizer A.. Deutscher J., Saier M.H. Jr., Reizer J. 
Mol. Microbiol. 5:1081-1089(1991). 

( 2] Adams M.J., Archibald I.G., Bugg C.E., Came A. f Gover S. ( 
Helltwell J.R., Pickersgill R.W., White S.W. 
EMBO J. 2:1009-1014(1983). 

[1681] 739. (7tm 1)G-protein coupled receptors [1 to 4.E1.E2] (also called R7G) are an extensive group of hormones, 
neurotransmitters, odorants and light receptors which transduce extracellular signals by interaction with guanine nu- 
cleotide-binding (G) proteins. The receptors that are currently known to belong to this family are listed below. 

- 5-hydroxytryptamine (serotonin) 1 A to 1 F, 2A to 2C, 4, 5A, 5B, 6 and 7 [5]. 
Acetylcholine, muscarinic-type, M1 to MS. 

Adenosine A1 , A2A, A2B and A3 [6]. 

- Adrenergic alpha-1 A to -1 C; alpha-2A to -2D; beta-1 to -3 [7J. 
Angiotensin II types I and II. 

Bombesin subtypes 3 and 4. 
Bradykinin B1 and B2. 
c3a and C5a anaphylatoxin. 
Cannabinoid CB1 and CB2. 

- Chemokines C-C CC-CKR-1 to CC-CKR-8. 

- Chemokines C-X-C CXC-CKR-1 to CXC-CKR-4. 
Cholecystokinin-A and cholecystokinin-B/gastrin. 
Dopamine D1 to D5 [8]. 

- Endothelin ET-a and ET-b [9]. 

- fMet-Leu-Phe (fMLP) (N-formyl peptide). 

- Follicle stimulating hormone (FSH-R) [10]. 
Galanin. 

- Gastrin-releasing peptide (GRP-R). 

- Gonadotropin-releasing hormone (GNRH-R). 
Histamine H1 and H2 (gastric receptor I). 

- Lutropin-chorbgonadotropic hormone (LSH-R) [10]. 
Melanocortin MC 1 R to MC5R. 

Melatonin. 

Neuromedin B (NMB-R). 

- Neuromedin K (NK-3R). 
Neuropeptide Y types 1 to 6. 
Neurotensin (NT-R). 
Octopamine (tyramine), from insects. 

- Odorants [11]. 

- Opioids delta-, kappa- and mu-types [1 2]. 

- Oxytocin (OT-R). 

Platelet activating factor (PAF-R). 

Prostacyclin. 

Prostaglandin D2. 

- Prostaglandin E2, EP1 to EP4 subtypes. 
Prostaglandin F2. 
Purinoreceptors (ATP) [1 3]. 
Somatostatin types 1 to 5. 

- Substance-K (NK-2R). 

- Substance-P(NK-IR). 
Thrombin. 
Thromboxane A2. 



EP 1 033 405 A2 

J Mol Biol 1993;230:1183-1199. 

'T^Tr r l ° thfee COpi6S ° f 3 Hp ° y ' binding fo,toWGd fa V «he catalytic domain 

11673] 735. 3-beta hydroxysteriod dehydrogenase/isomerase family 
Structure and tissue-specific expression of 3 

SeSr ' deh ^ e ^— iso-ase h -an and rat eta** and peripheral ster- 
Labrie F, Simard J, Luu-The V, Pelletier G, Belanger A 
Lachance Y, Zhao HF, Labrie C, Breton N, de Launoit Y, et al 
J Steroid Biochem Mol Biol 1 992 41 421 -435 

[1674] 736. 3-hydroxyacyl-CoA dehydrogenase 

This family also includes lambda crystallin. 

Structure of L-3-hydrcocyacyl.coenzyrne A dehydrogenase 

preliminary chain tracing at 2.8-A resolution. 

Birktoft JJ, Holden HM. Hamlin R, Xuong NH, Banaszak U 

Proc Natl Acad Sci U S A 1 987;84 8262-8266 

forms, with enovl-CoA hvdrata*~ "=cv\ ~nri<* o ^ T . „ A - A - ^ CIUA,5>omes J nydroxyacyl-CoA dehydrogenase 

[1677] The other proteins structurally related to HCDH are: «««nain W . 

" Set^3^^ dehydrogenase (EC ,,,,57, wheh reduces 3-nydroxybu,a„o y K»A ,o ace- 

- Eye lens protein lambda-crystallin [4], which is specific to lagomorphes (such as rabbit). 

pattern has been ^^SS^S^^ " h sequence. Asignature 

- ^njnsus pattern: PNEWSHOWl^^ 

[ 1] Birwof, J.J.. Hokfen H.M.. Hamiin R. Xuong N.-H., Banaszak L.J. Proc. Na... Acad. Sci. U.SA 84:8262-8266 
[2] Nakahigashi K.. Inokuchi H. Nucleic Acids Res. 18:4937-4937(19901 

[ 3] Mu.fcny P„ Clayton C.L. Fallen M.J.. Stone R, A,-Saleh A.. Tabaqcha.i S. FEMS Microbk,.. Len. 124:61-67 
\HSSSi^ ^ W - BtenkeS ' eiin WM ' BIOemendal H ■ * **" **. J. BW. Chem. 263: 

[1678] 737. 60s Acidic ribosomal protein 

Proteins P1 1 P2, and P0, components of the eukaryotic 

nbosome stalk. New structural and functional aspects 

Remacha M, Jimenez-Diaz A, Santos C ( Briones E, Zambrano R 

Hodnguez Gabriel MA, Guarinos E, Ballesta JP; 

Biochem Cell Biol 1 995;73:959-968. 

This family includes archaebacterial L12, eukaryotic P0, PI and P2 
[1679] 736. 6-phosphogluconate dehydrogenases 

6-phosphogluconate dehydrogenase (EC 1 1 1 44) ffiPfim r^u,^ ^ »u- ^ . 

shun, the decarboxylat^g rl^ J 6-phiph^S ZS££l£2£ *" «"™"«^ 
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integral membrane proteins with seven transmembrane regions that belong to family 1 of G-protein coupled receptors. 
[1 685] In vertebrates four different pigments are generally found. Rod cells, which mediate vision in dim light, contain 
the pigment rhodopsin. Cone cells, which function in bright light, are responsible for cotor vision and contain three or 
more color pigments (for example, in mammals: red, blue and green). 

[1686] In Drosophila. the eye is composed of 800 facets or ommatidia. Each ommatidium contains eight photore- 
ceptor cells (R1 -R8): the R1 to R6 cells are outer cells, R7 and R8 inner cells. Each of the three types of cells (R1-R6, 
R7 and R8) expresses a specific opsin. 

[1687] Proteins evolutionary related to opsins include squid retinochrome, also known as retinal photoisomerase, 
which converts various isomers of retinal into 11-cis retinal and mammalian retinal pigment epithelium (RPE) RGR {3], 
a protein that may also act in retinal isomerization. 

[1688] The attachment site for retinal in the above proteins is a conserved lysine residue in the middle of the seventh 
transmembrane helix. The pattern that had been developed includes this residue. 

- Consensus pattern: [UVMWACHPGC]-x(3)-[SAC]-K^ 

[K is the retinal binding site] 

[ 1] Applebury M.L, Hargrave P. A. 
Vision Res. 26:1881-1895(1986). 
[ 2] Fryxell K.J., Meyerowitz E.M. 
J. Mol. EvoL 33:367-378(1991). 

[ 3] Shen D., Jiang M., Hao W., Tao L, Salazar M. t Fong H.K.W. 
Biochemistry 33:13117-13125(1994). 

[1689] The following descriptions of protein family functions are not provided by the Pfam or Prosite databases 
[1690] 740. BAH 

BAH domain. Number of members: 65 

[1] Medline: 97074677. Molecular cloning of polybromo, a nuclear protein containing multiple domains including 
five bromodomains, a truncated HMG-box, and two repeats of a novel domain. Nicolas RH t Goodwin GH Gene 
1996;175:233-240. 

[2) Medline: 991 98739. The BAH (bromonadjacent homology) domain: a link between DNA methytation, replication 
and transcriptional regulation. Callebaut I, Courvalin J-C, Mornon JP; FEBS letts 1999;446:189-193. 

[1691] 741.ELM2. 

ELM2 domain. The ELM2 (Egl-27 and MTA1 homology 2) domain is a small domain of unknown function. Number of 
members: 10 

[1692] 742. Eukproin. EUKARYOTIC_PORIN The major protein of the outer mitochondrial membrane of eukaryotes 
is a porin that forms a voltage-dependent anion-selective channel (VDAC) that behaves as a general diffusion pore 
for small hydrophilic molecules [1 to 4]. The channel adopts an open conformation at low or zero membrane potential 
and a closed conformation at potentials above 30-40 mV. 

This protein contains about 280 amino acids and its sequence is composed of between 1 2 to 16 beta-strands that span 
the mitochondrial outer membrane. Yeast contains two members of this family (genes POR1 and POR2); vertebrates 
have at least three members (genes VDAC1 , VDAC2 and VDAC3) (5). 

A conserved region located at the C-terminal part of these proteins was selected as a signature pattern. 
C^sensus pattem[YH]-x(2)-D-{SPCAD]-x-(STA]^ 

[ 1] Benz R. Biochim. Bbphys. Acta 1197:167-196(1994). 
[ 2) Manella C.A. Trends Biochem. Sci. 17:315-320(1992). 
[ 3] Dihanich M. Experientia 46:146-153(1990). 

[ 4] Forte M„ Guy KR., Mannella C.A. J. Bioenerg. Bbmembr. 19:341-350(1987). 

[ 5] Sampson M.J., Lovell R.S., Davison D.B., Craigen W.J. Genomics 36:192-196(1996). 

[1 693] 743. Glyco hydor 1 9 
Chitinases family 19 signatures 

cross-reference(s) CHITINASE_19_1, CHITINASE_19_2 

Chitinases (EC 3.2.1 . 1 4) [1 ] are enzymes that catalyze the hydrolysis of the beta-1 ,4-N.acetyl-D-glucosamine linkages 
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- Thyrotropin (TSH-R) [1 oj. 

- Thyrotropin releasing factor (TRH-R). 

- Vasopressin V1 a, V1 b and V2. 

- Visual pigments (opsins and rhodopsin) [1 4]. 
Proto-oncogene mas. 

" ^"Z^hH reCeP,0fS (Wh0S6 ' i9and iS n0t kn0wn ^ ,rom ma ™** and birds 

- Caenorhabdrtrs elegans putative receptors C06G4.5. C38C10.1, C43C3 2 T27D1 3 and 7C&A a 

[ 1]Strosberg A.D. 

Eur. J. Biochem. 196:1-10(1991). 

[ 2] Kerlavage A.R. 

Curr. Opin. Struct. Biol. 1:394-401(1991) 

nL^f^ ° ' Snydef LA - D l - Brosius J Sealfon S.C 

DNA Cell Biol. 11:1-20(1992). 

1 4] Savarese T.M., Fraser CM. 

Biochem. J. 283:1-9(1992). 

1 5] Branchek T. 

Curr. Biol. 3:315-317(1993). 

[6] Stiles G.L 

J. Biol. Chem. 267:6451-6454(1992). 

1 7J Friell T, Kobilka B.K., Lefkowitz R.J., Caron M,G. 

Trends Neurosci. 11:321-324(1988). 

[ 8] Stevens C.F. 

Curr. Biol. 1:20-22(1991), 

[ 9] Sakurai T., Yanagisawa M. t Masaki T. 

Trends Pharmacol. Sci. 13:103-107(1992). 

[10] Salesse R., Remy J.J., Levin J.M., Jallal B., Gamier J 

Biochimie 73:109-120(1991). 

[11 J Lancet 0., Ben-Arie N. 

Curr. Biol. 3:668-674(1993). 

[12] Uhl G.R., Childers S M Pasternak G. 

Trends Neurosci. 17:89-93(1994). 

[13] Barnard E.A., Burnstock G. t Webb TE. 

Trends Pharmacol. Sci. 15:67-70(1994). 

[14] Applebury M.L, Hargrave RA. 

Vision Res. 26:1881-1895(1986). 

[15] Attwood T.K., Eliopoulos E.E M Findlay J B C 

Gene 98:153-159(1991). 

[1684] (7tm 1 ) Visual pigments (opsins) retinal binding site 



EP 1 033 405 A2 



Drosophila small optic lobes protein (gene sol), a neuronal protein that contains a calpain-like domain. 

- Yeast thiol protease BLH1/YCP1/LAP3. 

Caenorhabditis elegans hypothetical protein C06G4.2, a calpain-like protein. 

[1697] Two bacterial peptidases are also part of this family: 

Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 
Thiol protease tpr from Porphyromonas gingivalis. 

[1698] Three other proteins are structurally related to this family, but may have lost their proteolytic activity. 

Soybean oil body protein P34. This protein has its active site cysteine replaced by a glycine. 

- Rat testin, a Sertoli cell secretory protein highly similar to cathepsin L but with the active site cysteine is replaced 
by a serine. Rat testin should not be confused with mouse testin which is a LIM-domain protein (see 
<PDOC00382>). 

- Plasmodium falciparum serine-repeat protein (SERA), the major blood stage antigen. This protein of 11 1 Kd pos- 
sesses a C-terminal thiol-protease-like domain [6], but the active site cysteine is replaced by a serine. 

The sequences around the three active site residues are well conserved and can be used as signature patterns. 
[1699] Consensus patternQ-x(3)-[GE]-x-C-[YW]-x(2)-[STAGC]-[STAGCV] [C is the active site residue] 
Note the residue in position 4 of the pattern is almost always cysteine; the only exceptions are calpains (Leu), bleomycin 
hydrolase (Ser) and yeast YCP1 (Ser). Note the residue in position 5 of the pattern is always Gly except in papaya 
protease IV where it is Glu. Consensus pattern[LIVMGSTAN]-x-H-[GSACE]-[UVMJ-x-[LIVMAT)(2)-G-x-[GSADNH] [H 
is the active site residue] 

Consensus pattern[FYCH]-[WI]-[HvT]-xlK^^ [N 
is the active site residue] 

Note these roteins belong to family C1 (papain-type) and C2 (calpains) in the classification of peptidases [7.E1]. 
[ 1]Dufour E. Biochimie 70:1 335-1 342(1988). 

{ 2]Kirschke H., Barrett A.J., Rawlings N.D. Protein Prof. 2:1587-1643(1995). 

[ 3]Shi G.-P, Chapman H.A, Bhairi S.M., Deleeuw C, Reddy V.Y., Weiss SJ. FEBS Lett. 357:129-134(1995). 
[ 4]Velasco G„ Ferrando A. A., Puente XS., Sanchez L.M., Lopez-Otin C. J. Biol. Chem. 269:27136-27142(1 994). 
[ SJChapot-Chartier M.P., Nardi M., Chopin M.C., Chopin A, Gripon J.C. Appl. Environ. Microbiol 59 330-333 
(1993). 

[ 6]Higgins D.G., McConnell D.J., Sharp PM. Nature 340:604-604(1989). 
[ 7]Rawlings N.D., Barren AJ. Meth. Enzymol. 244:461-486(1994). 

[1700] 746. Peptidase M22 

Gtycoprotease family signature cross-reference(s) GLYCOPROTEASE 

Glycoprotease (GCP) (EC 3.4.24.57) [1 ], or o-syaloglycoprotein endopeptidase, is a metalloprotease secreted by Pas- 
teurella haemolytica which specifically cleaves O-sialoglycoproteins such as glycophorin A. The sequence of GCP is 
highly similar to the following uncharacterized proteins: 

- Escherichia coli hypothetical protein ygjD (ORF-X). 

- Bacillus subtilis hypothetical protein ydiE. 
Mycobacterium leprae hypothetical protein U229E. 

- Mycobacterium tuberculosis hypothetical protein MtCY78.10. 

- Synechocystis strain PCC 6803 hypothetical protein slr0807. 

- Methanococcus jannaschii hypothetical protein M J 1 1 30. 
Haloarcula marismortui hypothetical protein in HSH 3'region. 

- Yeast hypothetical protein YKR038c. 

- Yeast hypothetical protein QRI7. 

[1701] One of the conserved regions contains two conserved histidines. It is possible that this region is involved in 
coordinating a metal ion such as zinc. 

[1 702] Consensus pattem(KR]-{GSAT]-x(4)-[FYWLHHDQNGK]-x-P-x-{LI VMFY]-x(3)-H-x(2)-[AG]-H-[U VM] 
Note these proteins belong to family M22 in the classification of peptidases [2 t E1]. 
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in chitin polymers. From the view point of sequence similarity chitinases belong to either famih, in ™ no • », , 
ficafon of glycosy, hydropses [2.E1 ]. Chitinases of family 1 9 (also known a£££e7S iTZ i ° n ^ 
from plants that function. In the defense aaainsi funnel and in\^, ^ u IA or I and IB or II) are enzymes 
wa, Cass and IB,I enzymes dTS^tS S^S^r 9 

involved in a disulfide bond. conserved ,n most, rf not all. of these chrtinases and which is probably 

srT pat,ern ^ 

[1694] Consensuspanem[LIV^^GSA]-F- X -[STAGJ(2)^LIVMI^-W4FY]-W-[LIVM] 

[ IJFIach J., Pilet P.-E., Jolles P. Experientia 48:701-716(1992). 
[ 2] Henrissat B. Biochem. J. 280:309-316(1991). 

[1695] 744. MBD 
Methyl-CpG binding domain 

DNA demelhylase [2J. Nw^wolmen^eK: ^ ' ' MBOfi are louncf n several Melhyl-CpG binding protehs and also 

[1696] 745. Peptidase C1 

Eukaryotic thiol (cysteine) proteases active sites 

?^pSSe H, a°s^^ 

completes the e^SX-K^SS^ 2£ ? 8 ""^ Side 30 «ne 

(referer^sareonfypr^ren^ 

ST^SE^ ^ seems ^ h ■"•» w 

■ sreCr^ 

(EC 3.4.22,4); papay. latex pS^^SS^^Sa^ 1 ^ 1 ^ Sf? 

teinasfi IV rpr q/oo o C \. * wiynupapain (tc 3.4.22.6), cancain (EC 3 4 22 301 and nm- 

House-dust mites allergens DerP1 and EurM1 

AC-2). and Ostertagia ostertagi (CP-1 and CP-3) Haemoncnus contortus (genes AC-1 and 

Slime mold cysteine proteinases CP1 and CP2. 
Cruzipain from Trypanosoma cruzi and brucei 

Throphozoite cysteine proteinase (TCP) from various Plasmodium species 
Proteases from Le.shmania mexicana, Theileria annulata and Theileria parva 
Baculov.ruses cathepsin-like Enzyme (v-cath). 
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[1711] Us sequence is moderately conserved between prokaryotes (gene tolC) and eukaryotes. We developed two 
signature patterns based on the conserved regions which are rich in glycine residues and could play a role in the 
catalytical activity and/or in substrate binding. 
Description of pattem(s) and/or profile (s) 
Consensus pattem[LIVMFYJ-x-[LIVM^^ 

Consensus pattem[LIVMFY](2)-E-x-G^LIVM]-(GA]-G-x(2)-D-x-[GST)-x-[UVMl(2) 

[1712] [ 1]Shane B., Garrow T. Brenner A., Chen L, Choi Y.J., Hsu J.C., Stover R Adv. Exp Med Biol 338 629-634 
(1993). 

[1713] 750. (Peptidase M3) Neutral zinc metallopeptidases, zinc-binding region signature 

The majority of ztnc<lependent metallopeptidases (with the notable exception of the carboxypeptidases) share a com- 
mon pattern of primary structure [1 .2,3] in the part of their sequence involved in the binding of zinc, and can be grouped 
together as a superfamily, known as the metzincins, on the basis of this sequence similarity. They can be classified into 
a number of distinct families [4.E1 ] which are listed below along with the proteases which are currently known to belong 
to these families. 
[1714] Family M1 



Bacterial aminopeptidase N (EC 3.4.11 .2) (gene pepN). 
Mammalian aminopeptidase N (EC 3.4. 11 .2). 

- Mammalian glutamyl aminopeptidase (EC 3.4.11.7) (aminopeptidase A). It may play a role in regulating growth 
and differentiation of early B-lineage cells. 

Yeast aminopeptidase yscll (gene APE2). 

Yeast alanine/arginine aminopeptidase (gene AAP1). 

Yeast hypothetical protein YIL1 37c. 

- Leukotriene A-4 hydrolase (EC 3.3.2.6). This enzyme is responsible for the hydrolysis of an epoxide moiety of 
LTA-4 to form LTB-4; it has been shown that it binds zinc and is capable of peptidase activity. 

[1715] Family M2 



Angiotensin^converting enzyme (EC 3.4.15.1) (dipeptidyl carboxypeptidase I) (ACE) the enzyme responsible for 
hydrolyzing angiotensin I to angiotensin II. There are two forms of ACE: a testis-specific isozyme and a somatic 
isozyme which has two active centers. 



[1716] Family M3 



- Thimet oligopeptidase (EC 3.4.24.15). a mammalian enzyme involved in the cytoplasmic degradation of small 
peptides. 

- Neurolysin (EC 3.4.24. 1 6) (also known as mitochondrial oligopeptidase M or microsomal endopeptidase). 

- Mitochondrial intermediate peptidase precursor (EC 3.4.24.59) (Ml P). It is involved the second stage of processing 
of some proteins imported in the mitochondrion. 

Yeast saccharolysin (EC 3:4.24.37) (proteinase yscD). 

- Escherichia coli and related bacteria dipeptidyl carboxypeptidase (EC 3.4.1 5.5) (gene dcp). 

- Escherichia coli and related bacteria oligopeptidase A (EC 3.4.24.70) (gene opdA or prIC). 

- Yeast hypothetical protein YKL134c. 



[1717] Family M4 



Thermostable thermolysins (EC 3.4.24.27), and related thermotabile neutral proteases (bacillolysins) (EC 
3.4.24.28) from various species of Bacillus. 

Pseudofysin (EC 3.4.24.26) from Pseudomonas aeruginosa (gene lasB). 

Extracellular elastase from Staphylococcus epidermidis. 

Extracellular protease prtl from Erwinia carotovora. 

Extracellular minor protease smp from Serratia marcescens. 

Vibriofysin (EC 3.4.24.25) from various species of Vibrio. 

Protease prtA from Listeria monocytogenes. 

Extracellular proteinase proA from Legionella pneumophila. 



[1718] Family M5 
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! aSSS"* 5 n * «° R YC - Me " 0fS A - J " BaC,efi0 ' 173:5597-5603(199!) 
[ 21Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1 995). 

[1703] 747. SAM. SAM domain (Sterile alpha motif) 

[3]Medline: 99101382 The crystal structure of an Eoh recentnr q A M • 

o-qumols. This enzyme, found in prokaryotes as well as h mri-^i^T^ nenols and the ox.dat.on of c-d.phenols to 
as melanins and other pofyphenoJc corn^ds * ^ " °' pi9men,S such 

shared by some ^an"^ 

and arthropods [3,4]. '-wniamng oxygen earners from the hemofymph of many molluscs 

[1706] At least two proteins related to tyrosinase are known to exist in mammals: 

" ^-^^%Z^7:^ b,e *• COnVefSi ° n ° f ^^VdroxyindCe^^xylic acid (DHICA) to 

' tTcl^s^^^ **»""«. (EC 5.3.3,2) that cata^zes 

instead of copper [7]. Vros.nases and TRP-1 in that it binds two zinc ions 

[1 707] Other proteins that belong to this family are: 

- Plants polyphenol oxidases (PPO) (EC 1 10 3 1) which eatah/™ th* «vw 

diquinones[8]. 1 whlch «he oxidatton of mono- and o^diphenols to o- 

- Caenorhabditis elegans hypothetical protein C02C2. 1 

l zzjz ss sexes: r ess? ™ -? *— *» - - —» - - 

l IjLerch K. Prog. Clin. Biol. Res. 256:85-98(1988) 

[ 2]Jackman M.P., Hajnal A.. Lerch K. Bkxhem. J. 274:707-713(1991) 

[ 3]L.nzen B. Naturwissenschaften 76:206-21 1 (1 989) 

[ 4]Lang W.H., van Holde K.E. Proc. Natl. Acad. Sci. U.S.A. 88.244-248(1991) 
[ 6JJackson I. J., Chambers D M., Tsukamoto K Cooeland N a rm~* n ■ ■ ■ 

11 :527-535(1992). Copeland N.G.. Gilbert D.J., Jenkins N.A., Hearing V. EMBO J. 

[ 7]Solano P., Martinez-Liarte J.H., Jimenez-Cervantes C Garcia-Borron i r i „ 

Res. Commun. 204:1243-1250(1994). ' U>Zano J A B,oche m- Biophys. 

[ 8]Cary J.W., Lax A.R., Flurkey W.H. Plant Mol. Biol. 20:245-253(1992). 

[1710] 749. (MurLigase) Folylpolyglutamate synthase signatures 



EP 1 033 405 A2 



Snake venom metalloproteinases [6J. This subfamily mostly groups proteases that act in hemorrhage. Examples 
are: adamalysin II (EC 3.4.24.46), atrolysin C/D (EC 3.4.24.42), atrolysin E (EC 3.4.24.44), fibrolase (EC 3.4.24 72) 
trimerelysin I (EC 3.4.25.52) and II (EC 3.4.25.53). 
Mouse cell surface antigen MS2. 

s 

[1728] Family M1 3 

- Mammalian nepritysin (EC 3.4.24. 1 1 ) (neutral endopeptidase) (NEP). 

- Endothelin-converting enzyme 1 (EC 3.4.24.71 ) (ECE-1 ), which process the precursor of endothelin to release the 
io active peptide. 

- Kell blood group glycoprotein, a major antigenic protein of erythrocytes. The Kell protein is very probably a zinc 
endopeptidase. 

Peptidase O from Lactococcus lactis (gene pepO). 
is [1729] Family M27 

- Clostridial neurotoxins, including tetanus toxin (TeTx) and the various botulinum toxins (BoNT). These toxins are 
zinc proteases that block neurotransmitter release by proteolytic cleavage of synaptic proteins such as synapto- 
brevins, syntaxin and SNAP-25 [7,6]. 

20 

[1730] Family M30 

Staphylococcus hyicus neutral metalloprotease. 
25 [1731] Family M32 

- Thermostable carboxypeptidase 1 (EC 3.4.17.19) (carboxypeptidase Taq), an enzyme from Thermus aquaticus 
which is most active at high temperature. 

30 [1732] Family M34 

Lethal factor (LF) from Bacillus anthracis, one of the three proteins composing the anthrax toxin. 
[1733] Family M35 

35 

- Deuterolysin (EC 3.4.24.39) from Penicillium citrinum and related proteases from various species of Aspergillus. 
[1734] Family M36 

40 - Extracellular elastinolytic metalloproteinases from Aspergillus. 

[1735] From the tertiary structure of thermolysin, the position of the residues acting as zinc ligands and those involved 
in the catalytic activity are known. Two of the zinc ligands are histidines which are very close together in the sequence; 
C-terminal to the first histidine is a glutamic acid residue which acts as a nucleophile and promotes the attack of a 
« water molecule on the carbonyl carbon of the substrate. A signature pattern which includes the two histidine and the 
glutamic acid residues is sufficient to detect this superfamily of proteins. 
[1736] Description of pattern(s) and/or profile(s) 

Consensus patterri[GSTALIVN)-x(2)-H-E-[LIVMFYW]-{OEHRKP}-H-x-[Llv7^FYWGSPQ] [The 
two H's are zinc ligands] [E is the active site residue] 
50 Sequences known to belong to this class detected by the patternALL, 
except for members of families M5, M7 amd M11. 
Other sequence(s) detected in SWISS-PROT55; including Neurospora 
crassa conkJiation-specific protein 1 3 which could be a 
zinc-protease. 

55 

1 1]Jongeneel C.V., Bouvier J., Bairoch A 

FEBS Lett. 242:211-214(1989). 

[ 2]Murphy G.J.R, Murphy G., Reynolds J.J. 
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- Mycotysin (EC 3.4.24.31 ) from Streptomyces cacaoi. 
[1719] Family M6 

[1720] Family M7 

- Streptomyces extracellular small neutral proteases 
[1721] Family M8 

- Leishmanoiysin (EC 3.4.24.36) (surface glycoprotein 9 p63). a cell surface protease from various species of Leish- 
[1722] Family M9 

- Microbial collagenase (EC 3.4.24.3) from Clostridium perfringens and Vibrio alginolyticus. 
[1723] Family M10A 

- Serralysin (EC 3.4.24.40), an extracellular metalloprotease from Serratia 

- Alkaitne metalloproteinase from Pseudomonas aeruginosa (gene aprA) 

- Secreted proteases A, B, C and G from Erwinia chrysanthemi. 

- Yeast hypothetical protein YIL1 08w. 

[1724] Family M1 OB 

c f fof Zt , ( C 3 - 4 24 - 24 ) Kd gelatinase). MMP-9 (EC 3.4.24.35) (92 Kd aelatinasel MMP 7 fFr 

XlasJse" (strome| y s,n - 2 )- and M ^ (s«romelysin-3). MMP-12 (EC 3.4.24.65) (macrophage met- 

- Soybean metaltoendoproteinase 1. 
[1725] Family M11 

- Chlamydomonas reinhardtii gamete lytic enzyme (GLE). 
[1726] Family M12A 

- Astacin (EC 3.4.24.21 ), a crayfish endoprotease. 

" ^ P ™h° 3 4 24 18) : 3 mamm ^«an kidney and intestinal brush border metaltoendopeptidase 

pl%Zus } ParaCentf0tUS ,IV ' dus a " d protein SpAN from Strongylocentrotus 

- Caenorhabditis elegans protein toh-2. 

- Caenorhabditis elegans hypothetical protein F42A10 8 

- Choriolysins L and H (EC 3.4.24.67) (also known as embryonic hatching proteins LCE and Hen from th« sch 
[1727] Family M12B 
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faecalis and synA from Synechococcus contain one copy of the HMA domain. The cadmium ATPases cadA from 
Bacillus firmus and from plasmid p!258 from Staphylococcus aureus also contain a single HMA domain, while a 
chromosomal Staphylococcus aureus cadA contains two copies. Other, less characterized ATPases that contain 
the HMA domain are: fixl from Rhizobium melitoti, pacS from Synechococcus strain PCC 7942), Mycobacterium 
leprae ctpA and ctpB and Escherichia coli hypothetical protein yhhO. In all these ATPases the HMA domain(s) are 
located in the N -terminal section. 

- Mercuric reductase (EC 1.16.1.1) (gene merA) which is generally encoded by plasmids carried by mercury-resistant 
Gram-negative bacteria. Mercuric reductase is a class-1 pyridine nucleotide^ isulphide oxidoreductase (see 
<PDOC00073>). There is generally one HMA domain (with the exception of a chromosomal merA from Bacillus 
strain RC607 which has two) in the N-terminal part of merA 

- Mercuric transport protein periplasmic component (gene merP), also encoded by plasmids carried by mercury- 
resistant Gram-negative bacteria. It seems to be a mercury scavenger that specifically binds to one Hg(2+) ion 
and which passes it to the mercuric reductase via the merT protein. The N-terminal half of merP is a HMA domain. 
Helicobacter pylori copper-binding protein copP. 

- Yeast protein ATX1 [2], which could act in the transport and/or partitioning of copper. 

[1746] The consensus pattern for HMA spans the complete domain. 
[1 747] Description of pattem(s) and/or profile(s) 

Consensus pattern[L! VN]-x(2)-[LI VMFAJ-x-C-x-tSTAGCDNHJ-C-xfSJ-IUVFGl-xt^-fLIVl-xtg, 1 1 )-[! VA]-x-[LVFYS] [The 
two C's probably bind metals] 



[ 1]Bull P.C., Cox D.W. Trends Genet. 10:246-252(1994). 

[ 2]Lin S.-J., Culotta V.L Proc. Natl. Acad. Sci. U.S.A. 92:3784-3788(1995). 



[1748] 756. (Peptidase M10) Matrixins cysteine switch 
PROSITE cross-reference(s): CYSTEINE_SWITCH 

Mammalian extracellular matrix metalloproteinases (EC 3.4.24.-). also known as matrixins [1] (see <PDOC00129>), 
are zinc<Jependent enzymes. They are secreted by cells in an inactive form (zymogen) that differs from the mature 
enzyme by the presence of an N-terminal propeptide. A highly conserved octapeptide is found two residues downstream 
of the C-terminal end of the propeptide. This region has been shown to be involved in autoinhibition of matrixins [2,3]; 
a cysteine within the octapeptide chelates the active site zinc ion, thus inhibiting the enzyme. This region has been 
called the 'cysteine switch* or 'autoinhibitor region*. 
A cysteine switch has been found in the fol towing zinc proteases: 

- MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EC 3.4.24. 1 7) (stromelysin-1 ). 

- MMP-7 (EC 3.4.24.23) (matrilysin). 

- MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 

- MMP-9 (EC 3.4.24.35) (92 Kd gelatinase). 

- MMP-10 (EC 3.4.24.22) (strometysin-2). 

- MMP-11 (EC 3.4.24.-) (stromelysin-3). 

- MMP-12 (EC 3.4.24.65) (macrophage metalloelastase). 

- MMP-1 3 (EC 3.4.24.-) (collagenase 3). 

- MMP-14 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 1 ). 
MMP-15 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 2). 
MMP-16 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 3). 

- Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) [4]. 

- Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5]. 

[1749] Description of pattern(s) and/or profile(s) 

Consensus patternP-R-C-[GN]-x-P-(DR]-[LI VSAPKQ] [C chelates the zinc ion] 
[ 1]Woessner J. Jr. FASEB J. 5:2145-2154(1991). 

[ 2]Sanchez-Lopez R.. Nicholson R, Gesnel M.C., Matrisian L.M., Breathnach R. J. Biol Chem 263 1 1 892-1 1 899 
(1988). 

[ 3]Park A.J., Matrisian L.M., Kells A.F., Pearson R., Yuan 2., Navre M. J. Biol. Chem. 266:1584-1590(1991) 
[ 4]Lepage T, Gache C. EMBO J. 9:3003-3012(1990). 
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FEBS Lett. 289:4-7(1991). 

[ 3]Bode W., Grams R, Reinemer P., Gomis-Rueth R.-X., Baumann U. ( McKay 

D.B., Stoecker W. 

Zoology 99:237-246(1996). 

[ 4]Rawlings N.D., Barrett A. J. 

Meth. Enzymol. 248:183-228(1995). 

[ 5)Woessner J. Jr. 

FASEB J. 5:2145-2154(1991). 

1 6]Hite LA., Fox J.W., Bjarnason J.B. 

[ 7]Montecucco C, Schiavo G. 

Trends Biochem. Sci. 18:324-327(1993). 

[ 8]Niemann H., Blasi J., Jahn R. 

Trends Cell Biol. 4:179-185(1994). 

[1737] 751. PseudoU_synt_1 

tRNA pseudouridine synthase is involved in the formation of pseudouridine at the anticodon stem and loop of transfer- 
RNAs Pseudouridine is an isomer of uridine (5-(beta-D-ribofuranosyl) uracil, and id the most abundant modified nucl- 
eoside found in all cellular RNAs. The TruA-like proteins also exhibit a conserved sequence with a strictly conserved 
aspartic acid, likely involved in catalysis. Number of members: 25 

[1738] (1]Medline: 98254513. Transfer RNA-pseudouridine synthetase Pus1 of Saccaromyces cerevisiae contains 
one atom of zinc essential for its native conformation and tRNA recognition. Arluison V, Hountondji C Robert B Gros- 
jean H; Biochemistry 1 998;37:7263-7276. 
[1 739] 752. EPSP synthase signatures 

EPSP synthase (3-phosphoshikimate 1 -carboxyvinyltransf erase) (EC 2.5.1.19) catalyzes the sixth step in the biosyn- 
thesis from chorismate of the aromatic amino acids (the shikimate pathway) in bacteria (gene aroA), plants and fungi 
(where it is part of a multifunctional enzyme which catalyzes five consecutive steps in this pathway) [1 J. EPSP synthase 
has been extensively studied as it is the target of the potent herbicide glyphosate which inhibits the enzyme. 
[1740] The sequence of EPSP from various biological sources shows that the structure of the enzyme has been well 
conserved throughout evolution. Two conserved regions were selected as signature patterns. The first pattern corre- 
sponds to a region that is part of the active site and which is also important for the resistance to glyphosate [2]. The 
second pattern is located in the C-terminal part of the protein and contains a conserved lysine which seems to be 
important for the activity of the enzyme. 
[1 741 ] Description of pattem(s) and/or profiie(s) 

[1742] Consensus pattem[LIVM]-x(2)-[GNJ-N-{SA]-G-T-[STA]-x-R-x-[Llv^Yl 
Consensus pattern[KRpx-[KH]-E-[CSTHDN3 

[ IJStailings W.C., Abdel-Megid S.S., Urn L.W., Shieh H.-S., Oayringer H.E., Leimgruber N.K., Stegeman R.A., 
Anderson K.S., Sikorski J.A., Padgette S.R., Kishore G.M. Proc. Natl. Acad. Sci. U.S.A. 88:5046-5050(1991) 
[ 2]Padgette S.R., Re D.B., Gaser C.S., Eicholtz DA, Frazier R.B., Hironaka C.M., Levine E.B., Shah D M Fraley 
R.T., Kishore G.M. J. Biol. Chem. 266:22364-22369(1 991 ). 

[1 743] 753. Glyco_hydro_1 8 

Glycosyl hydrolases family 18. Number of members: 173 

[1]Medline: 95219379. Crystai structure of a bacterial chitinase at 2.3 A resolution. Perrakis A, Tews I, Oauter Z, Op- 
penheim AB, Chet I, Wilson KS, Vorgias CE; Structure 1994;2:1169-1180 
[1744] 754. Esterase 
Putative esterase 

This family contains Esterase D Swiss:P10768. However it is not clear if all members of the family have the same 

function. This family is possibly related to the COesterase family. 

Number of members: 36 

[1745] 755. (HMA) Heavy-metal-associated domain 

A conserved domain of about 30 amino acid residues has been found [1] in a number of proteins that transport or 
detoxify heavy metals. This domain contains two conserved cysteines that could be involved in the binding of these 
metals. The domain has been termed Heavy-Metal-Associated (HMA). It has been found in: 

- A variety of cation transport ATPases (E 1 -E2 ATPases) (see <PDOC001 39>). The human copper ATPAses ATP7A 
and ATP7B which are respectively involved in Menke's and Wilson's diseases. ATP7A and ATP7B both contain 6 
tandem copies of the HMA domain. The copper ATPases CCC2 from budding yeast, copA from Enterococcus 
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{ 2)Siezen R.J. (In) Proceeding subtilisin symposium, Hamburg, (1992). 
[ 3]Barr P.J. Cell 66:1-3(1991). 

[ 4]Shaulsky G. ( Kuspa A., Loomis W.R; Genes Dev. 9:1111-1122(1995). 
[ SJRawlings N.D., Barrett A. J. Meth. Enzymol. 244:19-61(1994). 

[1752] 758. (SSB) Single-strand binding protein family signatures 
PROSITE cross-reference(s): PS00735; SSB_1,RS00736; SSB_2 

The Escherichia coli single-strand binding protein [1} (gene ssb), also known as the helix<Jestabilizing protein, is a 
protein of 1 77 amino acids. It binds tightly, as a homotetramer, to single-stranded DNA (ss-DN A) and plays an important 
role in DNA replication, recombination and repair. 

[1753] Closely related variants of SSB are encoded in the genome of a variety of large self-transmissible plasmids 
SSB has also been characterized in bacteria such as Proteus mirabilis or Serratia marcescens. 
[1 754] Eukaryotic mitochondrial proteins that bind ss-DNA and are probably involved in mitochondrial DNA replication 
are structurally and evolutionary related to prokaryotic SSB. Proteins currently known to belong to this subfamily are 
listed below [2]. 7 

- Mammalian protein Mt-SSB (P1 6). 

- Xenopus Mt-SSBs and Mt-SSBr. 
Drosophila MtSSB. 

Yeast protein RIM1. 

[1755] Two signature patterns have been developed for these proteins. The first is a conserved region in the N- 

terminal sect.on of the SSB's. The second is a centrally located region which, in Escherichia coli SSB, is known to be 

involved in the binding of DNA. 

[1756] Description of pattem(s) and/or profile(s) 

Consensus pattern[UVMF]-[NSTHKRT^ 

Consensus pattemT-x-W-[HY]-[RNSl-(LIVM]-x-[LIVMF]-[FY]-[NGKR] 

[ 1]Meyer R.R., Laine P.S. Microbiol. Rev. 54:342-380(1990). 
[ 2]Stroumbakis N.D., Li Z. ( Tolias P.P. Gene 143:171-177(1994). 

[1757] 759. KDPG and KHG aldolases active site signatures 

PROSITE cross-reference(s): PS00159; ALDOLASE_KDPG_KHG_1 , PS00160; ALDOLAS E_KDPG_KHG_2 
[1758] 4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) (KHG-aldolase) catalyzes the interconversbn of 4-hydroxy- 
2-oxoglutarate into pyruvate and glyoxylate. Phospho-2-dehydro-3-deoxygluconate aldolase (EC 4 1 2 14) (KDPG- 
aldolase) catalyzes the interconversion of 6-phospho-2Kiehydro-3^eoxy-D-gluconate into pyruvate and qlyceralde- 
hyde 3-phosphate. 

[1759] These two enzymes are structurally and functionally related [1]. They are both homotrimeric proteins of ap- 
proximately 220 amino-acid residues. They are class I aldolases whose catalytic mechanism involves the formation 
of a Schiff-base intermediate between the substrate and the epsilon^amino group of a lysine residue. In both enzymes 
an arginine is required for catalytic activity. 

[1 760] Two signature patterns were developed for these enzymes. The first one contains the active site arginine and 
the second, the lysine involved in the Schiff-base formation. 
[1761] Description of pattem(s) and/or profile(s) 

Consensus patternG-[LIVM]-x(3)-E4LIV]-T-{LF]-R [R is the active site residue] Consensus patternG-x(3HUVMFl-K- 

[LF]-F-P-[SA]-x(3)-G [K is involved in Schiff-base formation] 

[1762] [ 1] Vlahos C J., Dekker E.E. J. Bioi. Chem. 263:11683-11691(1988). 

[1763] 760. AP endonucleases family 1 signatures. PROSITE cross-reference(s)' PS00726 

AP_NUCLEASE_F1_1 , PS00727; AP_NUCLEASE_F1_2, PS00728 

AP_NUCLEASE__F1_3 

[1 764] DNA damaging agents such as the antitumor drugs bleomycin and neocarzinostatin or those that generate 
oxygen radicals produce a variety of lesions in DNA. Amongst these is base-loss which forms apurinic/apyrimidinic 
(AP) sites or strand breaks with atypical 3temVmi. DNA repair at the AP sites is initiated by specific endonuclease 
cleavage of the phosphodiester backbone. Such endonucleases are also generally capable of removing blockinq 
groups from the 3terminus of DNA strand breaks. 

[1765] AP endonucleases can be classified into two families on the basis of sequence similarity. Family 1 arouos 
the enzymes listed below [1 ]. y y H 
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I SJKinoshitaT., Fukuzawa H., Sh^adaT., SaitoT.. Malsuda Y. Proc. Natl. Acad. Sci. U.S.A. 89:4693-4697(1 992). 
EJSJ 57 " (Pep,idase S8 > Sarine Pleases, subtilase family, active sites 

«Wceaiourid»ieresidues * ' ndependent Urgent evolution. The 

from that of the analogous resW °es in tet^ZZZ ^^P 3 ^ 90 ^ 86 "" 6 and histidine) are cornpletefy different 
category of proteases ^ ^ 030 te used as specific to that 

The subtilase family currently includes the following proteases: 

- Alkaline elastase YaB from Bacillus sp. (gene ale) 

- Alkaline serine exoprotease A from vibrio alginolyticus (gene proA) 

- Aqualysm I from Thermus aquaticus (gene pstl). 
AspA from Aeromonas salmon icida. 

- BaciUopeptidase F (esterase) from Bacillus subtilis (gene bpf) 

- C5A peptidase from Streptococcus pyogenes (gene scpA) 

- Cell envelope-located proteases PI, Pll, and Pill from Lactococcus lactis 

- Extracellular serine protease from Serrate marcescens 

- extracellular protease from Xanthomonas campestris 

- Intracellular serine protease (ISP) from various Bacillus 

• Minor extracellular serine protease epr from Bacillus subtilis (gene epr) 

- M.nor extracellular serine protease vpr from Bacillus subtilis (gene vpr) ' 

- N.sin leader peptide processing protease nisP from Uctococcus lactis 

- Serotype-specific antigene 1 from Pasteurella haemolytica (gene ssal)' 

- Thermitase (EC 3.4.21.66) from Thermoactinomyces vulgaris 

- Ca cium-dependent protease from Anabaena variabilis (gene prcA) 

- Halolysin from halophilic bacteria sp. 1 72p1 (gene hly) 

- Alkaline extracellular protease (AEP) from Yarrowia lipolytica (gene xpr2) 

- Alkaline proteinase from Cephalosporium acremonium (gene alp) 

- Cereyisin (EC 3.4.21.48) (vacuolar protease B) from yeast (gene PRB1) 

- Cuticle-degrading protease (prl) from Metarhizium anisopliae. 

- KEX-1 protease from Kluyveromyces lactis. 

- Kexin (EC 3.4.21 .61 ) from yeast (gene KEX-2) 

- Oryzin (EC 3.4.21 .63) (alkaline proteinase) from Aspergillus (gene alp) 

- Proteinase K (EC 3.4.21 .64) from Tritirachium album (gene proK) 

- Proteinase R from Tritirachium album (gene proR). 

- Proteinase T from Tritirachium album (gene proT). 

- Subtilisin-like protease III from yeast (gene YSP3) 

- Thermomycolin (EC 3.4.21 .65) from Malbianchea sutfurea. 

- Funn (EC 3.4.21 .85), neuroendocrine convertases 1 to 3 (NEC-1 to -31 and PAPfa ™m~,«. , 
vertebrates, and invertebrates. These proteases are involved n Jp^eiSa 0^™^^ ' 
comprised of pairs of basic amino acid residues [3] Pressing of hormone precursors at sites 

' I r r,?,i dyl ' PeP f tidaSe " (EC 3 4 1410) (,fipeptid y ™*W»"*»e) from Human 
[1 751] Description of pattem(s) and/or profile(s) 

"° * * **" Site Si9na,UfeS - - ^ °< » - ~* P-ease 
Note these proteins belong to family S8 in the classification of peptidases [5.E1]. 

t USiezen R.J.. de Vos W.M.. Leunissen JAM.. Dijkstra B.W. Protein Eng. 4:719-737(1991). 
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'C: conserved cysteine involved in a disulfide bond, 
f: optional cysteine involved in a disulfide bond, 
'*': position of the pattern. 



[1782] The categories of proteins, in which the CTL domain has been found, are listed below. 

[1783] Type-ll membrane proteins where the CTL domain is located at the C-terminal extremity of the proteins: 

- Asialoglycoprotein receptors (ASGPR) (also known as hepatic lectins) (4). The ASGPR's mediate the endocytosis 
of plasma glycoproteins to which the terminal sialic acid residue in their carbohydrate moieties has been removed. 

- Low affinity immunoglobulin epsilon Fc receptor (lymphocyte IgE receptor), which plays an essential role in the 
regulation of IgE production and in the differentiation of B cells. 

- Kupffer cell receptor. A receptor with an affinity for galactose and fucose, that could be involved in endocytosis. 

- A number of proteins expressed on the surface of natural killer T<eHs: NKG2, NKR-P1, YE1/88 (Ly-49), CD69 
and on B-cells: CD72, LyB-2. The CTL- domain in these proteins is distantly related to other CTL-domains; it is 
unclear whether they are likely to bind carbohydrates. 

[1784] Proteins that consist of an N-tenminal collagenous domain followed by a CTL- domain [5], these proteins are 
sometimes called 'collectins': 

- Pulmonary surfaciant-associated protein A(SP-A). SP-A is a calcium-dependent protein that binds to surfactant 
phospholipids and contributes to lower the surface tension at the air-liquid interface in the alveoli of the mammalian 
lung. 

Pulmonary surfactant-associated protein D (SP-D). 

Conglutinin, a calcium-dependent lectin-like protein which binds to a yeast cell wall extract and to immune com- 
plexes through the complement component (iC3b). 

Mannan-binding proteins (MBP) (also known as mannose -binding proteins). 
MBP's bind mannose and N-acetyl-D-glucosamine in a calcium-dependent 
manner. 

Bovine collectin-43 (CL-43). 

[1 785] Selectins (or LEC-CAM) [6,7]. Selectins are cell adhesion molecules implicated in the interaction of leukocytes 
with platelets or vascular endothelium. Structurally, selectins consist of a long extracellular domain, followed by a 
transmembrane region and a short cytoplasmic domain. The extracellular domain is itself composed of a CTL<Jomain, 
followed by an EGF-like domain and a variable number of SCR/Sushi repeats. Known selectins are: 

Lymph node homing receptor (also known as L-selectin, leukocyte adhesion 
molecule-1, (LAM-1), leu-8, gp90-mel, or LECAM-1) 
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- Escherichia coli exonuclease III (EC 3.1.11.2) (gene xthA) 

- StreptcxxKxus pneumoniae and Bacillus subtilis exonuclease A (gene exoA) 

- Mammalian AP endonuclease 1 (AP1) (EC 4.2.99.18). 

- Drosophila recombination repair protein 1 (gene Rrp1 ) 

- Arabidopsis thaliana apurinic endonuclease-redox protein (gene arp). 

Consensus panernN-x-G-x-R^LIVMJ-D-[UVMFYH]-x-[LV]-x-S 
[ 1] Barziby G., Hickson I.S. BioEssays 17713-719(1995) 

[ 2] Mo. CD.. Kuo C,F., Thayer M.M.. Cunningham R.P.. Tainer JA Nature 374 381 -386(1 995) 

™ ."^^"^r^^^^^^ 3 ^^ ,9naurepat,ern 

£2 762. ?CTF°^ h ; M - MCCa " Um C - Da " 9 - VU "■• T8Ubota S l - G<™ 186 189-195(1997) 

PS0M96^E^TF^yipHA ^' eclror > transfer fetoprotein alpha.ubunit signature, PROS.TE SSrence(s, : 

- Escherichia coli hypothetical protein ydiR. 

- Escherichia coli hypothetical protein ygcQ. 

[1778] A highly conserved reqion which is loeatPri in th Q r 

these proteins. ^ " ^ C - ,enT " nal s ^»n was selected as a signature pattern for 

[1779] Consensus pattern [UJ-Y-fLI VMJ-fATJ-x-G-p VHSDJ-G-x-f. VJO-H-^G^-Hvl-x-A-f. V]-N 

[1] Rnocchiarc > G., Ikeda Y, (to M., Tanaka K. Prog. Clin. Biol. Res 321-637^52f1990» 
[ 2] Tsa. M.H., Sarar M.H. Jr. Res. Microbiol. 146:397-404(1995). 652 ( 19 9°) 

oZ^lJ 63 (leCti " C) C ' typ6 lec,in domain *>9™to<° and profile 

riSfl A Cr °^ TYPE LECTIN 2 

S,e^ 

main, which is known as the C-type S> ^T^TiToT^Z^T 1 ^^ * mHin [1 ^ ™ s *- 
of about 110 to 130 rescues. Th'ere are fouT^l^iS Ire 22 f^T™ *"* ™ —ists 
bonds. A schema,* represents of the CTL^T stot ' *" h ' W ° diSU " ide 
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than the pattern, you should use it il you have access to the necessary software tools to do so. 

[ 1] Drickamer K. J. Biol. Chem. 263:9557-9560(1988). 

[ 2] Drickamer K. Prog. Nucleic Acid Res. Mol. Biol. 45:207-232(1993). 

[ 3) Drickamer K. Curr. Opin. Struct. Biol. 3:393-400(1993). 

[ 4] Spiess M. Biochemistry 29:10009-10018(1990). 

[ 5) Weis W.I., Kahn R., Fourme R., Drickamer K., Hendrickson W.A. Science 254:1608-1615(1991). 

1 6) Siegelman M. Curr. Biol. 1:125-128(1991). 

[ 7] Lasky LA. Science 238:964-969(1992). 

[ 8] Jomori T., Natori S. J. Biol. Chem. 266:13318-13323(1991). 

[ 9) Ng N.FL, Hew C.-L J. Biol. Chem. 267:16069-16075(1992). 

[1793] 764. (SRCR) Speract receptor repeated domain signature 
PROSITE cross-reference(s): PS00420; SPERACT_RECEPTOR, 

[1794] The receptor for the sea urchin egg peptide speract is a transmembrane glycoprotein of 500 amino acid 
residues (1J. Structurally it consists of a large extracellular domain of 450 residues, followed by a transmembrane 
region and a small cytoplasmic domain of 12 amino acids. The extracellular domain contains four repeats of a 115 
amino acids domain. There are 17 positions that are perfectly conserved in the four repeats, among them are six 
cysteines, six glycines, and three glutamates. 

[1 795] Such a domain is also found, once, in the C-terminal section of mammalian macrophage scavenger receptor 
type I [2J, amembrane glycoproteins implicated in the pathologic deposition of cholesterol in arterial walls during athero- 
genesis. 

[1796] The signature pattern that was derived spans part of the N-terminal section of the domain and contains 8 of 
the 17 conserved residues. 

[1797] Consensus pattemG-x(5)-G-x(2)-E-x(6)-W-G-x(2)-C-x(3)-[FYW]-x(8)-C-x(3)-G 

[ 1) Dangott J.J., Jordan J.E., Bellet R.A., Garbers D.L Proc. Natl. Acad. Sci. U.S.A. 86:2128-2132(1989). 

[ 2] Freeman M., Ashkenas J., Rees D.J., Kingsiey D.M., Copeland N.G., Jenkins N.A., Krieger M Proc Natl 

Acad. Sci. U.S.A. 87:8810-8814(1990). 

[1798] 765. Bac_surface_Ag 
Bacterial surface antigen 

This entry includes the following surface antigens; D15 antigen from H.influenzae, OMA87 from Pmuftocida, OMP85 
from N.mentngrtidis and N.gonorrhoeae. Number of members: 14 

[1]Medline: 95255676. The sequencing of the 80-kDa D15 proteciive surface antigen of Haemophilus influenzae. 
Flack FS, Loosmore S, Chong P, Thomas WR; Gene 1995;156:97-99. 

[2] Medline: 96333354. Cloning, sequencing, expression, and protective capacity of the oma87 gene encoding the 
Pasteurella multocida 87-kilodalton outer membrane antigen. Ruffolo CG, Adler B Infect Immun 1996 64* 
3161-3167. ' ' 



[1799] 766. BRCA1 C Terminus (BRCT) domain 

The BRCT domain is found predominantly in proteins involved in cell cycle checkpoint functions responsive to DNA 
damage. It has been suggested that the Retinoblastoma protein contains a divergent BRCT domain, this has not been 
included in this family. The BRCT domain of XRCC1 forms a homodimer in the crystal structure Medline:99016060 
This suggests that pairs of BRCT domains 
associate as homo- or heterodimers. Number of members: 131 

[1] Medline: 96259550. BRCA1 protein products ...Functional motifs... Koonin EV, Altschul SF, Bork P" Nature 
Genet 1996;13:266-268. 

[2] Medline: 97153217. From BRCA1 to RAP1: A widespread BRCT module closely associated with DNA repair 
Callebaut I, Mornon JP; Febs lett 1997;400:25-30. 

[3] Medline: 97186552. A superfamily of conserved domains in DNA damage responsive cell cycle checkpoint 
proteins Bork P, Hofmann K, Bucher P, Neuwald AF, Altschul SF, Koonin EV; Faseb J 1997;11:68-76. 
[4] Medline: 97402527. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 
Altschul SF, Madden TL. Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ; Nucleic Acids Res 1997 25- 
3389-3402. * " 

[5] Medline: 99016060. Structure of an XRCC1 BRCT domain: a new protein -protein interaction module. Zhang 



EP 1 033 405 A2 

- Endothelial leukocyte adhesion molecule 1 (ELAM-1, E-selectin or LECAM-2). 
[1786] The ligand recognized by ELAM-1 is sialyl-Lewis x. 

' ST'*, men l brane pro1ein 140 (GMP-140, P-selectin, PADGEM, CD62, or LECAM- 
3). The hgand recognized by GMP-1 40 is Lewis x. «= ww 

[1787] Large proteoglycans that contain a CTL<Jomain followed by one codv of a SCR/ q,«k- ♦ • u 
terminal section: y py OT a bOW Sus hi repeat, in their C- 



Neurocan. 
" sig^^ 

' SSSZT* ' r0m r CrOPha9eS ^ Pr ° ,ein mediates *° endocytosis of 

extracellular section consists of a BbraS,t!fi?T a "^ en -P rocess ' n 9 compartiments. DEC-205 

- Silk moth hemocytin^ utorai ^ZTnl^Jt^ * *" ^ ^ ^ °' ,he CTL 

domains (see <PDOC00988>), a CTL selWefence mechanism. It is composed of 2 FA58C 

domain. 2 VWFC domains (see <PDOC00928), and a CTCK (see <PDOC00912». 
[1 790] Various other proteins that uniquely consist of a CTL domain: 

tam»a«. a Mr. from me unbu Pr*androoaS ™i2u» ? ^ s '"*" ,h * 01 » 

Eos,noph,l granule major basic protein (MBP). a cytotoxic protein 
• A galactose specific lectin from a rattlesnake 

- Two subun.ts of a phospholipase A2 inhibitor from the plasma of a snake (PLI-A and PLl-m 
" c^ch ndin9 Pr ° ,ein (LPS - BP) ,f0m ,he h^Ph of a ( PU * y 

- Sea raven antifreeze protein (AFP) [9]. 

three C*s are involved in disulfide bonds] 

* . 'n"i-j-x-tu 1 N ; ,Hj-x( < i,<,-x(5.6)-[FYWLIVSTAHLIVMSTA]-C [The 
^.documentation entry is to bo^a signage pa„em and a profile. As the profi.e is much more sens^ 
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zyme that catalyzes the hydrolysis of GDP to GMP 
. Potato apyrase (EC 3.6. 1 .5) (adenosine diphosphatase) (ADPase). Apyrase acts on both ATP and ADP to produce 
AMP. 

- Mammalian vascular ATP-diphosphohydrolase (EC 3.6. 1 .5) (also known as lymphoid cell activation antigen CD39). 

- Toxoplasma gondii nucleoside-triphosphatases (EC 3.6.1.15) (NTPase). NTPase hydropses various nucleoside 
triphosphates to produce the corresponding nucleoside mono- and diphosphates. This enzyme is secreted into 
the invaded host cell into the parasitophorous vacuole, a specialized compartment where the parasite intraceltulary 
resides. 

Pea nucleoside-triphosphatases (EC 3.6.1.15) (NTPase). 

Caenorhabditis elegans hypothetical protein C33H5.14. 
Caenorhabditis elegans hypothetical protein R07E4.4. 
Yeast chromosome V hypothetical protein YEROOSw. 

[1 808] The above uncharacterized proteins all seem to be membrane-bound. 

[1809] All these proteins share a number of conserved domains. The best conserved of these domains have been 
selected. It is located in the central section of the proteins. 

[1810] Consensus pattern[LIVM]-x-G-x(2)-E-G-x-[FYJ-x-[FW]-[LIVA]-[TAG]-x-N-[HY] 
[ 1] Handa M., Guidotti G. Biochem. Biophys. Res. Commun. 218:916-923(1996). 

[ 2] Vasconcelos E.G., Ferreira ST., de Carvalho T.M.U.. de Souza W., Kettlun A.M.. Mancilla M., valenzuela M. 
A., Verjovski-Almeida S. J. Biol. Chem. 271:22139-22145(1996). 

[1811] 771. GTP cyclohydrolase I signatures 

PROSITE cross-reference(s); GTP_CYCLOHYDROL_1_1, GTP_CYCLOHYDROL_1_2 GTP cyclohydrolase I (EC 
3.5.4.16) catalyzes the biosynthesis of formic acid and dihydroneopterin triphosphate from GTP. This reaction is the 
first step in the biosynthesis of tetrahydrofolate in prokaryotes, of tetrahydrobiopterin in vertebrates, and of pteridine- 
containing pigments in insects. 

[1812] GTP cyclohydrolase I is a protein of from 190 to 250 amino acid residues. The comparison of the sequence 
of the enzyme from bacterial and eukaryotic sources shows that the structure of this enzyme has been extremely well 
conserved throughout evolution [1]. 

[1 81 3] Two conserved regions were selected as signature patterns. The first contains a perfectly conserved tetrapep- 
tide which is part of the GTP-binding pocket [2], the second region also contains conserved residues involved in GTP- 
binding. 

[1814] Consensus pattem[DENJ-[LI VM](2)-x(2)-[KRNQ]-[DEN]-[LI VM]-x(3)-[ST]-x-C-E- H-H 
Consensus pattern[SA]-x-[RKl-x-Q-[LIVM]-Q-E-[RNHLI]-[TSN] 

[ 1] Maier J. ( Witter K., Guetlich M., Ziegler I., Werner T, Ninnemann H. Biochem. Biophys Res Commun 212" 
705-711(1995). 

[ 2] Nar H., Huber R., Meining W., Schmid C, Weinkauf &, Bacher A. Structure 3:459-466(1995). 
[1815] 772. llvC. Acetohydroxy acid isomeroreductase 

Acetohydroxy acid isomeroreductase catalyses the conversion of acetohydroxy acids into dihydroxy valerates. This 
reaction is the second in the synthetic pathway of the essential branched side chain amino acids valine and isoleucine. 
Number of members: 29 

[1816] [1] Medline: 97361822. The crystal structure of plant acetohydroxy acid isomeroreductase complexed with 
NADPH, two magnesium ions and a herbicidal transition state analog determined at 1.65 A resolution. Biou V, Dumas 
R, Cohen-Addad C. Douce R, Job D, Pebay-Peyroula E; EMBO J 1997;16:3405-3415. 
[1817] 773. Prokaryotic membrane lipoprotein lipid attachment site 
PROSITE cross-reference(s); PROKARJJPOPROTEIN 

In prokaryotes, membrane lipoproteins are synthesized with a precursor signal peptide, which is cleaved by a specific 
lipoprotein signal peptidase (signal peptidase II). The peptidase recognizes a conserved sequence and cuts upstream 
of a cysteine residue to which a glyce ride-fatty acid lipid is attached [1]. Some of the proteins known to undergo such 
processing currently include (for recent listings see [1 ,2.3]): 

Major outer membrane lipoprotein (murein-lipoproteins) (gene Ipp). 
Escherichia coli lipoprotein -28 (gene nip A). 
Escherichia coli lipoprotein-34 (gene nlpB). 
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X. MoreraS. BatesPA. Whitehead PC, Coffer Al, HainbucherK. Nash RA, Sternberg MJ. Lindah.T. Freemen, PS; 
[1800] 767. Kappa casein 

nomacropeptide) Kappa^casem) and a soluble hydrophtlic glycopeptkJe (casei- 

PROSITE cross-reference(s) CHJTINASE_18 

a) Chittnases from; 

" SUC A h A,teromonas - Bacil1 ^. Serratia, Strep.omyces, etc 

- Plants such as Arab.dopsis, cucumber, bean, tobacco etc 

- Fungi such as Aphanocbdium, Rhizopus, Saccharomyces. etc 

- Nematode (Brugia malayi). 
Insects (Manduca sexta). 

- Baculoviruses (Autographa Califomica Nuclear Polyhedrosis virus). 

b) Other proteins: 

- Hevamine. a rubber tree protein with chitinase and Vsozyme activities 

• K uyveromyces tecis killer toxin alpha subunit, which arts as a chit inL 

- Flavobactenum and Streptomyces endc-beta-N-acet y iglucosaminidases' ( EC 3 2 1 

- Jack bean concanavalin B (conB). a protein that has lost its catalytic activity. 

-TL^eS^^^ 

the best conserved region^ ^ese proZ "* 35 3 *"* ThiS « * ■» of 

[1804] Consensus pattem[LIVMFY]-(DNJ-G-[UVMF]-[DN]-[LIVMF]-[DN]-x-E fE is the active site resWueJ 

[ 1] Flach J., Pilet P.-E., Jolles P. Experientia 48:701-716(1992) 
[ 2] Hennssat B. Biochem. J. 280:309-316(1991) 

t^anabeT., Kohori K., MyashitaK.. F^T./sakaiH . UchidaM..TanakaH. J. B^.Chera 268:18567-18572 
[4, Perrak* K. Tews ,., Dauter Z. Oppenheim A.B.. Che, ... WOson K.S., Verges C.E. Structure 2:1169-1180 
[5] van Scheltinga A.C.T., Kalk K.H.. Beintema J.J.. Dijkstra B.W. Structure 2:1181-1189(1994). 
[1805] 769. gag_p1 7. gag gene protein p1 7 (matrix protein) 

l^Z^TJ^lSt" 1 '* ^ aSS0Cia,ed ^ *• - - — immunodefciency 

KHOSITE cross-reference(s); GDA1_CD39_NTPASE 
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[1 822] Class-ll tRN A synthetases do not share a high degree of similarity, however at least three conserved regions 
are present [2,5,8]. Signature patterns from two of these regions have been derived 
[1823] Consensus pattemfFYH]-R-x-[DE]-x(4,12HRH]-x{3)-F-x(3HDE] 
Consensus patterr(GSTALVFHDENGHRKPHGSTA^ 

[ IJSchimmel P. Annu. Rev. Biochem. 56:125-158(1987). 
[ 2JDetame M., Moras D. BioEssays 15:675-687(1993). 
[3]Schimmel P. Trends Biochem. ScL 16:1-3(1991). 

[ 4]Nagel G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
[ 5]Cusack S„ Haertlein M., Leberman R Nucleic Acids Res. 19:3489-3498(1991). 
[ 6]Cusack S. Biochimie 75:1077-1081(1993). 

[ 7JCusack S., Berthet-Cotominas C, Haertlein M., Nassar N., Leberman R. Nature 347:249-255(1990). 
[8]Leveque F., Plateau P., Dessen P., Blanquet S. Nucleic Acids Res. 18:305-312(1990). 

[1824] 775. X. Trans-activation protein X 

This protein is found in hepadnaviruses where it is indispensable for replication. Number of members: 91 
[1825] 776. Thymidylate synthase active site 

[1826] Thymidylate synthase (EC 2.1.1 .45) [1,2] catalyzes the reductive methylation of dUMP to dTMP with con- 
comitant conversion of 5,10-methylenetetrahydrofolate to dihydrofolate. Thymidylate synthase plays an essential role 
in DNA synthesis and is an important target for certain chemotherapeutic drugs. 

[1827] Thymidylate synthase is an enzyme of about 30 to 35 Kd in most species except in protozoan and plants 
where it exists as a bifunctional enzyme that includes a dihydrofolate reductase domain. 

[1828] A cysteine residue is involved in the catalytic mechanism (it covalently binds the 5,6-dihydro<iUMP interme- 
diate). The sequence around the active site of this enzyme is conserved from phages to vertebrates. 

[1829] Consensus pattemR-x{2)-fLIVM]-x(3)-[FW]-[QN]-x(8,9)-[L\^-x-P-C-[HAVM]-x(3)4QMT]-[FYW]-x-[L\n [C is 
the active site residue] 

1 1) Benkovic S.J. Annu. Rev. Biochem. 49:227-251(1980). 

[ 2] Ross P., O'Gara F. t Condon S. Appl. Environ. Microbiol. 56:2156-2163(1990). 

[1830] 777. Glycosyl hydrolases family 31 signatures 

[1831] It has been shown [1.2,3,E1] that the following glycosyl hydrolases can be. on the basis of sequence similar- 
ities, classified into a single family: 

• Lysosomal alpha-glucosidase (EC 3.2.1.20) (acid maltase) is a vertebrate glycosidase active at low pH, which 
hydrolyzes alpha(1 ->4) and alpha(1 ->6) linkages in glycogen, maltose, and isomaltose. 

- Alpha-glucosidase (EC 3.2. 1 .20) from the yeast Candida tsukunbaensis. 

- Alpha-glucosidase (EC 3.2. 1 .20) (gene malA) from the archebacteria Suffolobus solfataricus. 

- Intestinal sucrase-isomaltase (EC 3.2. 1 .48 / EC 3.2. 1 . 1 0) is a vertebrate membrane-bound, multifunctional enzyme 
complex which hydrolyzes sucrose, maltose and isomaltose. The sucrase and isomaltase domains of the enzyme 
are homologous (41% of amino acid identity) and have most probably evolved by duplication. 

- Glucoamytase 1 (EC 3.2.1 .3) (glucan 1,4-alpha-glucosidase) from various fungal species. 
Yeast hypothetical protein YBR229c. 

Fission yeast hypothetical protein SpAC30D11 .01c. 

[1 832] An aspartic acid has been implicated [4] in the catalytic activity of sucrase, isomaltase, and lysosomal alpha- 
glucosidase. The region around this active residue is highly conserved and can be used as a signature pattern. A 
second region, which contains two conserved cysteines, has been used as an additional signature pattern 
[1833] Consensus pattern [GF]-[LIVMF]-W-x-D-M-[NSA]-E |D is the active site residue] 
Consensus pattern G^Av>D-[LIV^]-C-G-[FY]-^ 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Kinsella B.T.. Hogan S„ Larkin A., Cantwell B.A. Eur. J. Biochem. 202:657-664(1991). 

[ 3] Nairn H.Y, Niermann T., Kleinhans U., Hollenberg CP, Strasser A.W.M. FEBS Lett. 294:109-112(1991). 

[ 4] Hermans M.M.P, Kroos M.A., van Beeumen J. t Oostra B.A., Reuser AJ.J. J Biol Chem 266" 13507-1 351 2 

(1991). 



[1834] 778. Urease signatures 
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- Escherichia coli lipoprotein nlpC. 

- Escherichia coii lipoprotein nlpD. 

- Escherichia coli osmotically inducible lipoprotein B (gene osmB) 

- Escherichia coli osmotically inducible lipoprotein E (gene osmE) 

- Escherichia coli peptidoglycan-associated lipoprotein (gene pal) 

- Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 

- Escherichia coli copper homeostasis protein cutF (or nlpE) 

- Escherichia coli plasmids traT proteins. 

- Escherichia coli Col plasmids lysis proteins. 

- A number of Bacillus beta-lactamases. 

- Bacillus subtilis periplasms oligopeptide-binding protein (gene oppA) 

- Borreha burgdorferi outer surface proteins A and B (genes ospA and ospB) 

- Borreha hermsii variable major protein 21 (gene vmp21) and 7 (gene vmp7) 

- Chlamydia trachomatis outer membrane protein 3 (gene omp3) 

- Fibrobacter succinogenes endoglucanase cel-3. 

- Haemophilus influenzae proteins Pal and Pep. 

- Klebsiella pullulunase (gene pulA). 

- Klebsiella pullulunase secretion protein puis. 
Mycoplasma hyorhinis protein p37. 

- Mycoplasma hyorhinis variant surface antigens A, B, and C (genes vIpABC) 

- Neisseria outer membrane protein H.a 

- Pseudomonas aeruginosa lipopeptide (gene IppL). 

- Pseudomonas solanacearum endoglucanase egl 

- Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC) 
Rickettsia 17 Kd antigen. 

- Shigella flexneri invasion plasmid proteins mxiJ and mxiM 

- Streptococcus pneumoniae oligopeptide transport protein A (gene amiA) 

- Treponema pallidium 34 Kd antigen. 

- Treponema pallidium membrane protein A (gene tmpA) 

- Vibrio harveyi chitobiase (gene chb). 

- Yersinia virulence plasmid protein yscJ. 

J 1J Hayashi S.. Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990) 
2]Klein P.. Somorjai R.L, Lau P.C.K. Protein Eng. 2:15-20(1988> 
[ 3Jvon Heijne G. Protein Eng. 2:531 -534( 1 989) 

[ 4IMattar S., Scharf B., Ken, S.B.H.. Rodewa.d K.. Oesterhett D.. Engelhard M. J. Bio.. Chem. 269:14939-14945 
ESS 77 A ' Aminoac y | - ,r a"s»er RNA synthetases cbss-ll signatures 

HROSITEcross-reference(s); AA_TRNA LIGASE II 1 AA TRNA I ifiA^e n o ,.r,.,. 
synthetLes^etoreS e ^ 

each different amino acid: one cytosolic form land T^^Z ^ ^ ^^^^^^^ 

in their catalytk: domain for ie^ISwIfi^ * [2 * 6] 30d <> robabi Y have a common folding pattern 

the class I synthetase^ ° "* ^ ° to Rossmann ««* for 
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[1 844] Structurally, ail known receptor PTPases, are made up of a variable length extracellular domain, followed by 
a transmembrane region and a C-terminal catafytic cytoplasmic domain. Some of the receptor PTPases contain fi- 
bronectin type 111 (FN -III) repeats, immunoglobulin-like domains, MAM domains or carbonic anhydrase-like domains 
in their extracellular region. The cytoplasmic region generally, contains two copies of the PTPAse domain. The first 
seems to have enzymatic activity, while the second is inactive but seems to affect substrate specificity of the first. In 
these domains, the catalytic cysteine is generally conserved but some other, presumably important, residues are not. 
[1845] In the following table, the domain structure of known receptor PTPases is shown: 



Extracellular 


Intracellular 




lg FN-3 CAH MAM PTPase 


Leukocyte common antigen (LCA) (CD45) 


0 


2 


0 


0 


2 


Leukocyte antigen related (LAR) 


3 


8 


0 


0 


2 


Drosophila DLAR 


3 


9 


0 


0 


2 


Drosophila DPTP 


2 


2 


0 


0 


2 


PTP-alpha (LRP) 


0 


0 


0 


0 


2 


PTP-beta 


0 


16 


0 


0 


1 


PTP-gamma 


0 


1 


1 


0 


2 


PTP-defta 


0 


>7 


0 


0 


2 


PTP-epsilon 


0 


0 


0 


0 


2 


PTP-kappa 


1 


4 


0 


1 


2 


PTP-mu 


1 


4 


0 


1 


2 


PTP-zeta 


0 


1 


1 


0 


2 



[1 846] PTPase domains consist of about 300 amino acids. There are two conserved cysteines, the second one has 
been shown to be absolutely required for activity. Furthermore, a number of conserved residues in its immediate vicinity 
have also been shown to be important 

[1847] A signature pattern was derived for PTPase domains centered on the active site cysteine. 
[1848] There are three profiles for PTPases, the first one spans the complete domain and is not specific to any 
subtype. The second profile is specific to dual-specificity PTPases and the third one to the PTP subfamily. 
[1849] Consensus pattern [LIVMF]-H-C-x(2)-G-x(3)-[STC]-[STAGP)-x-[LIVMFY] [C is the active site residue] 
[1850] Notethe M-phase inducer phosphatases (cdc25-type phosphatase) are tyros ine-protein phosphatases that 
are not structurally related to the above PTPases. 

[1851] Notethis documentation entry is linked to both a signature pattern and to profiles. As profiles are much more 
sensitive than the pattern, you should use them if you have access to the necessary software tools to do so. 

[ 1] Fischer E.H., Charbonneau H., Tonks N.K. Science 253:401-406(1991). 
[ 2] Charbonneau H., Tonks N.K. Annu. Rev. Cell Biol. 8:463-493(1992). 
[ 3] Trowbridge I.S. J. Biol. Chem. 266:23517-23520(1991). 
( 4] Tonks N.K., Charbonneau H. Trends Biochem. Sci. 14:497-500(1989). 
[5] Hunter T. Cell 58:1013-1016(1989). 



[1852] 780. Connexins signatures 

[1853] Gap junctions [1] are specialized regions of the plasma membrane which consist of closely packed pairs of 
transmembrane channels, the connexons, through which small molecules diffuse from a cell to a neighboring cell. Each 
connexon is composed of an hexamer of an integral membrane protein which is often referred to as connexin In a 
given species there are a number of different, yet structurally related, tissue specific, forms of connexins. The types 
of connexins which are currently known are listed below 



Connexin 56 (Cx56). 

Connexin 50 (Cx50) (lens fiber protein MP70). 

Connexin 46 (Cx46) (alpha-3). 

Connexin 45 (Cx45) (alpha-6). 

Connexin 43 (Cx43) (alpha-1). 

Connexin 40 (Cx40) (alpha-5). 

Connexin 38 (Cx38) (alpha-2). 
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[1835] Urease (EC 3.5.1.5) is a nickel-binding enzyme that catalyzes the hydrolysis of urea to carbon dk»ui »„h 
ammonia H). Historically, it was the first enzyme to be crystallized fin 19261 it ZL , I ^ K>X ' de 30(1 

[1838] Consensus pattern T-[AY]-[GAHGAT]-[UVMhD-x-H-[UVMJ-H-x(3)-P [The two H's bind nickell 
Consensus pattern IUVII|(2MCTHHHI^^ 

[ 1] Takishima K., Suga T., Mamiya G. Eur. J. Biochem. 175:151-165(1988) 

[ 2] Mobley H.LT.. Husinger R.P. Microbiol. Rev. 53:85-108(1989). 

[ 3] Jabri E., Carr M.B., Hausinger R.P., Karplus PA. Science 268:998-1004(1995). 

[1 839] 779. Tyrosine specific protein phosphatases signature and profiles 

[1841] Soluble PTPases. 

- PTPN1 (PTP-1B). 

- PTPN2 (T-cell PTPase; TC-PTP) 

" TnT^ S 1, f nd . PTPN4 < MEG >> enzv ™* that contain an N-terminal band 4.1- like domain (see <PDOC0056^ 
and could act at junctions between the membrane and cytoskeleton. <PDOC00566>) 

r i riN/ (LC-PTP, Hematopoietic protein-tyrosine phosphatase* HePTP) 

- PTPN8 (70Z-PEP). ' 

- PTPN9 (MEG2). 

- PTPN12 (PTP-G1; PTP-P19). 

- Yeast PTP1. 

Yeast PTP2 which may be involved in the ubiquitin-mediated protein degradation pathway 
■ Fission yeast pyp1 and PyP 2 which play a role in inhibiting the onset of mitosis 

- Fission yeast pyp3 which contributes to the dephosphorylation of cdc2 

- Yeast CDC14 which may be involved in chromosome segregation 

- Yersinia virulence plasmid PTPAses (gene yopH). 

- Autographa californica nuclear polyhedrosis virus 1 9 Kd PTPase. 

[1 842] Dual specificity PTPases. 

^K-S PN1 ° : ^ PhOSphataSe ^ MKP ^ -hich dephosphorytetes MAP k^ase on both Thr-183 

r D esidSs (PAC ' 1) * 3 nUdeaf en2yme ^ de ^ s P hoi Y ,ate * ^ kinases ERK1 and ERK2 on both Thr and Tyr 

- DUSP3 (VHR). 

- DUSP4 (HVH2). 

- DUSP5 (HVH3). 

- DUSP6(Pyst1; MKP-3). 

- DUSP7 (Pyst2; MKP-X). 

- Yeast MSGS, a PTPase that dephosphorylates MAP kinase FUS3 

- Yeast YVHl. 

- Vaccinia virus H1 PTPase; a dual specificity phosphatase. 
[1 843] Receptor PTPases. 



OTQ 
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[1 860] It has been proposed that this hexapeptide sequence is responsible for a post-translational modification nec- 
essary for the proper anchoring of the proteins which bear it, to the cell wall. 
Proteins known to contain such hexapeptide are listed below: 

Aggregation substance from streptococcus faecalis (asal ). 

C5a peptidase from Streptococcus pyogenes (scpA). 

C protein alpha-antigen from Streptococcus agalactiae (bca). 

- Cell surface antigen l/ll (PAC) from Streptococcus mutans. 
Dextranase from Streptococcus downei (dex). 

- Fibronectin-binding protein from Staphylococcus aureus (fnbA). 
Fimbrial subunrts from Actinomyces naeslundit and viscosus. 
IgA binding protein from Streptococcus pyogenes (arp4). 

IgA binding protein (B antigen) from Streptococcus agalactiae (bag). 
IgG binding proteins from Streptococci and Staphylococcus aureus. 
Intemalin A from Listeria monocytogenes (inIA). 
M proteins from streptococci. 

Muramidase-released protein from Streptococcus suis (mrp). 

Nisin leader peptide processing protease from Lactococcus lactis (nisP). 

Protein A from Staphylococcus aureus. 

Trypsin-resistant surface T protein from streptococci. 

Wall-associated protein from Streptococcus mutans (wapA). 

Wall-associated serine proteinases from Lactococcus lactis. 

[1861] Consensus pattemL-P-x-T-G-[STGAVDE] 

[1862] [ 1] Schneewind O., Jones K.F, Fischetti V.A. J. Bacteriol. 172:3310-3317(1990). 
[1663] 782. Gamma-glutamyltranspeptidase signature 

[1864] Gamma-glutamyltranspeptidase (EC 2.3.2.2) (GGT) [1] catalyzes the transfer of the gamma-glutamyl moiety 
of glutathione to an acceptor that may be an amino acid, a peptide or water (forming glutamate). GGT plays a key role 
in the gamma-glutamyl cycle, a pathway for the synthesis and degradation of glutathione. In prokaryotes and eukary- 
otes, it is an enzyme that consists of two polypeptide chains, a heavy and a light subunit, processed from a single 
chain precursor. The active site of GGT is known to be located in the light subunit. 

[1865] The sequences of mammalian and bacterial GGT show a number of regions of high similarity [2]. Pseu- 

domonas cephalosporin acylases (EC 3.5. 1 .-) that convert 7-beta-(4-carboxybutanamido)-cephalosporanic acid (GL- 

7ACA) into 7-aminocephalosporanic acid (7ACA) and glutaric acid are evolutionary related to GGT and also show 

some GGT activity [3]. Like GGT, these GL-7ACA acylases, are also composed of two subunits. 

[1866] One of the conserved regions correspond to the fsl-terminal extremity of the mature light chains of these 

enzymes. This region has been used as a signature pattern. 

[1867] Consensus pattemT-[STA]-H-x-[ST].[LIV^^ 

[ 1] Tate S.S., Meister A. Meth. Enzymol. 113:400-419(1985). 

[ 2) Suzuki H., Kumagai H., EchigoT, Tochikura T. J. Bacteriol. 171:5169-5172(1989). 
[ 3] Ishiye M., Niwa M. Biochim. Biophys. Acta 1132:233-239(1992). 

[1868] 783. Ferrochelatase signature 

[1 869] Ferrochelatase (EC 4.99.1 . 1 ) (protoheme ferro-lyase) [1 ,2] catalyzes the last step in heme biosynthesis: the 
chelation of a (errous ion to proto-porphyrin IX, to form protoheme. 

[1 870] In eukaryotes, ferrochelatase is a mitochondrial protein bound to the inner membrane, whose active site faces 
the mitochondrial matrix. The mature form of eukaryotic ferrochelatase is composed of about 360 amino acids. In 
bacteria, ferrochelatase (gene hemH) [3] is a protein of from 310 to 380 amino acids. 

[1871] The human autosomal dominant disease protoporphyria is due to the reduced activity of ferrochelatase. 
[1 872] The signature pattern lor this enzyme is based on a conserved region which contains a histidine residue which 
could be involved in binding iron. 

[1873] Consensus pattem[LIVMF](2)-x-[ST]-x-H-[GS]-[LIVM]-P-x(4,5)-{DENQKR]-x-G-[DP]-x(1,2)-Y 

[ 1| Labbe-Bois R. J. Biol. Chem. 265:7278-7283(1990). 

[ 2| Brenner D.A., Frasier F. Proc. Natl. Acad. Sci. U.S.A. 88:84S £53(1 991). 

[ 3] Miyamoto K. ( Nakahigashi K. ( Nishimura K., Inokuchi H. J. Mol. Biol. 219:393-398(1991). 
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Connexin 37 (Cx37) (alpha-4). 
Connexin 33 (Cx33) (aipha-7). 
Connexin 32 (Cx32) (beta-1). 
Connexin 31.1 (Cx31.1) (beta-4). 
Connexin 31 (Cx31) (beta-3). 
Connexin 30.3 (Cx30.3) (beta-5). 
Connexin 26 (Cx26) (beta-2). 
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[1856] Consensus pattemC^x^Sl?^ 

sensus PattemC-x(3.4)-P-C^^ " T bond " Con " 

bonds] 11 11 T n UVM J I>AJ-[KR]-P (The three Cs are involved in disulfide 

nS ifil G r Jen ° U9h ° A ' G0 " 9er JA ' Paul D L ^ nnu - Rev. Biochem. 65 475-502(1996) 
S Gram - pos,,,ve cocci «"»«» Proteins 'anchoring' hexapeptide ' >' 

S ^aCopSVZ^ *— a ^ residues down- 

This structure is represented in SSS^^^S^^^ * * *** * ^ ««» " 
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I Variable length extracellular domain |H| Anchor |B| 



+ + _ + ^ + 



ss H': conserved hexapeptide. 
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dinner, a by-product of nylon manufacture [4]. 
Gtutamyl-tRNA(Gln) amidotransferase subunrt A [5]. 
Mammalian fatty acid amide hydrolase (gene FAAH) [6]. 
A putative amidase from yeast (gene AMD2). 

Mycobacterium tuberculosis putative amidases amiA2, amiB2, amrC and amiD. 

[1881] Ail these enzymes contain in their central section a highly conserved region rich in glycine, serine, and alanine 
residues. This region has been used as a signature pattern. 
Consensus pattern: G-[GA]-S-[GSHGSj-G-x-[GSA]-(GSAVThx-[LW^^ 
[UVM]-R-x-P-[GSAC] 

[ 1] Mayaux J.-R, Cerbelaud E.. Soubrier R, Faucher D., Petre D. J. Bacterid. 172:6764-6773(19g0). 

[ 2] Hashimoto Y., Nishiyama M., Ikehata O., Horinouchi S., Beppu T. Biochim. Biophys. Acta 1088:225-233(1 991 ). 

[ 3) Chang T.-H., Abelson J. Nucleic Acids Res. 18:7180-7180(1990). 

[ 4] Tsuchiya K., Fukuyama S. t Kanzaki N., Kanagawa K.. Negoro S., Okada H. J. Bacteriol. 171 :31 87-31 91(1 989) 
( 5] Curnow A.W., Hong K.W., Yuan R., Kim S.I.. Martins O., Winkler W. f Henkin T.M., Soli D Proc Natl Acad 
Sci. USA 94:11819-11826(1997). 

[ 6) Cravatt B.R, Giang O.K., Mayfield S.P., BogerD.L, Lemer R.A., Gilula N.B. Nature 384:83-87(1996). 
[1 882] 786. Glycosyl hydrolases family 1 0 active site 

[1 883] The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases 
(EC 3.2.1.4), cellobiohydrolases (EC 32.1.91) (exoglucanases), or xylanases (EC 3.2.1.8) [1,2]. Fungi and bacteria 
produces a spectrum of cellulolytic enzymes (cellutases) and xylanases which, on the basis of sequence similarities, 
can be classified into families. One of these families is known as the cellulase family F [3] or as the glycosyl hydrolases 
family 10 [4.E1]. The enzymes which are currently known to belong to this family are listed below 

Aspergillus awamori xylanase A (xynA). 

Bacillus sp. strain 125 xylanase (xynA). 

Bacillus stearothermophilus xylanase. 

Butyrivibrio fibrisotvens xylanases A (xynA) and B (xynB). 

- Caldocellum saccharolyticum Afunctional endoglucanase/exoglucanase (celB). This protein consists of two do- 
mains; it is the N-terminal domain, which has exoglucanase activity, which belongs to this family. 
Caldocellum saccharolyticum xylanase A (xynA). 

- Caldocellum saccharolyticum ORF4. This hypothetical protein is encoded in the xynABC operon and is probably 
a xylanase. 

Cellulomonas fimi exoglucanase/xylanase (cex). 
Clostridium stercorarium thermostable celloxylanase. 

- Clostridium thermocellum xylanases Y (xynY) and Z (xynZ). 
Cryptococcus albidus xylanase. 

Penicillium chrysogenum xylanase (gene xylP). 
Pseudomonas fluorescens xylanases A (xynA) and B (xynB). 

- Ruminococcus flavefaciens bifunctional xylanase XYLA (xynA). This protein consists of three domains: a N-ter- 
minal xylanase catalytic domain that belongs to family 11 of glycosyl hydrolases; a central domain composed of 
short repeats of Gin, Asn an Trp, and a C-termrnal xylanase catalytic domain that belongs to family 10 of glycosyl 
hydrolases. 

- Streptomyces lividans xylanase A (xlnA). 
Thermoanaerobacter saccharolyticum endoxylanase A (xynA). 
Thermoascus aurantiacus xylanase. 

- Thermophilic bacterium Rt8.B4 xylanase (xynA). 

[1 884] One of the conserved regions in these enzymes is centered on a conserved glutamic acid residue which has 
been shown [5], in the exoglucanase from Cellulomonas fimi, to be directly involved in glycosidic bond cleavage by 
acting as a nucleophile. This region has been used as a signature pattern. 

[1885] Consensus pattem[GTA]-x(2HLIVN]-x-[IVMF]-[ST)-E-[LIYHDNHLIVMF] [E is the active site residue) 
[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

[ 2] Gilkes N.R., Henrissat B.. Kilbum O.G., Miller R.C. Jr.. Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
[ 3) Henrissat B., Claeyssens M., Tomme P., Lemesle L, Mornon J.-P. Gene 81:83-95(1989). 
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[1874] 784. Cellulose-binding domain, bacterial type 

- Endoglucanase (gene endl) from Bulyrivibrio fibrisolvens 

- E^SS^V! (9Sne Ce0A) a " d B (C6nB) ,rom c «'«"°™nas Ami. 

- 5? ne CbhA) a " d B (cbh8 > ,rom Cellulomonas r,mi. 
Endog ucanase E-2 (gene celB) from Thermomonospora fusca 

- EndoglucanaseA(genecelA)fromMicrobisporabispora 

- Exocelbbiohydrolase (gene cex) from Cellulomonas fimi ' 

" A (9e ." e "* B (Xy " B) ,rom P^udomonas fluorescens 

- Chitmase C from Streptomyces lividans. 
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[1879] 785. Amidases signature 
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[ 4) Rawiings N.D., Barrett A. J. Meth. EnzymoL 244:19-61(1994). 
[1896] 789. Format e-tetrahydrofolate ligase signatures 

[1897] Formate-tetrahydrofolate ligase (EC 6.3.4.3) (formyftetrahydrofolate synthetase) (FTHFS) is one of the en- 
zymes participating in the transfer of one-carbon units, an essential element of various biosynthetic pathways. In many 
of these processes the transfers of one-carbon units are mediated by the coenzyme tetrahydrofolate (THF). Various 
reactions generate one-carbon derivatives of THF which can be interconverted between different oxidation states by 
FTHFS, methylenetetrahydrofolate dehydrogenase (EC 1.5.1.5) and methenyltetrahydrofolate cyclohydrolase (EC 
3.5.4.9). 

[1898] In eukaryotes the FTHFS activity is expressed by a multifunctional enzyme, C-1 -tetrahydrofolate synthase 
(C1-THF synthase), which also catalyzes the dehydrogenase and cyclohydrolase activities. Two forms of C1-THF 
synthases are known [1 ]. one is located in the mitochondrial matrix, while the second one is cytoplasmic. In both forms 
the FTHFS domain consist of about 600 amino acid residues and is located in the C-terminal section of C1-THF syn- 
thase. In prokaryotes FTHFS activity is expressed by a monofunctional homotetramertc enzyme of about 560 amino 
acid residues [2]. 

[1899] The sequence of FTHFS is highly conserved in all forms of the enzyme. As signature patterns, two regions 
that are almost perfectly conserved were selected. The first one is a glycine-rich segment located in the N-terminal 
part of FTHFS and which could be part of an ATP-binding domain [2]. The second pattern is located in the central 
section of FTHFS. 

[1900] Consensus pattemG-[LIVM]-K-G-G-A-A-G-G-G-Y 
Consensus patternV-A-T-[IV]-R-A-L-K-x-[HN]-G-G 

[ 1J Shannon K.W., Flabinowitz J.C. J. Biol. Chem. 263:7717-7725(1988). 

[ 2] Lovell C.R., Przybyla A., Ljungdahl LG. Biochemistry 29:5687-5694(1990). 

[1901] 790. Transthyretin signatures 

[1902] Transthyretin (prealbumin) [1] is a thyroid hormone-binding protein that seems to transport thyroxine (T4) 
from the bloodstream to the brain. It is a protein of about 130 amino acids that assembles as a homotetramer and 
forms an internal channel that binds thyroxine. Transthyretin is mainly synthesized in the brain choroid plexus. In 
humans, variants of the protein are associated with distinct forms of amyloidosis. 

[1903] The sequence of transthyretin is highly conserved in vertebrates. A number of uncharacterized proteins also 
belong to this family: 

Escherichia coli hypothetical protein yedX. 
Bacillus subtilis hypothetical protein yunM. 
Caenorhabditis elegans hypothetical protein R09H10.3. 
Caenorhabditis elegans hypothetical protein 2K697.8. 

[1904] Two regions were selected as signature patterns. The first located in the N-terminal extremity starts with a 

lysine known to be involved in binding T4. The second pattern is located in the C-terminal extremity. 

[1 905] Consensus pattem[KH]-[IVI-L-[DN]-x(3)-G-x-P-A-x(2)-[l V]-x-(l V] [The K binds thyroxine) 

Consensus pattern Y-[TH]-[IV)-[APJ-x(2)-L-S-[PQ]-[FYWHGS)-[FY]-[QS] 

[1906] [ 1] Schreiber G.. Richardson S.J. Comp. Biochem. Physiol. 116B: 137-1 60(1 997). 

[1907] 791 . Dihydropteroate synthase signatures 

[1908] All organisms require reduced folate cofactors for the synthesis of a variety of metabolites. Most microorgan- 
isms must synthesize folate de novo because they lack the active transport system of higher vertebrate cells which 
allows these organisms to use dietary folates. Enzymes that are involved in the biosynthesis of folates are therefore 
the target of a variety of antimicrobial agents such as trimethoprim or sulfonamides. 

[1909] Dihydropteroate synthase (EC 2.5.1.15) (DHPS) catalyzes the condensation of 6-hydroxymethyl-7,8-dihy- 
dropteridine pyrophosphate to para-am inobenzoic acid to form 7,8-dihydropteroate. This is the second step in the three 
steps pathway leading from 6-hydroxymethyl-7,8-dihydropterin to 7,8-dihydrofolate. DHPS is the target of sulfonamides 
which are substrates analog that compete with para-am inobenzoic acid. 

[1910] Bacterial DHPS (gene sul or folP) [1] is a protein of about 275 to 315 amino acid residues which is either 

chromosomally encoded or lound on various antibiotic resistance plasmtds. In the tower eukaryote Pneumocystis car- 

inii, DHPS is the C-terminal domain of a multifunctional folate synthesis enzyme (gene fas) [2]. 

[1911] Two signature patterns for DHPS were developed, the first signature is located in the N-terminal section of 

these enzymes, while the second signature is located in the central section. 

[1912] Consensus pattem[LIVMl-x-[AGJ-[LIVMF](2)-N-x-T-x-D-S-F-x-D-x-[SG] 
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[ 4] Henrissat B. Biochem. J. 280:309-316(1991) 

(1991).' °" W " heW S G -' GilkSS N Ki ' bUm ° G - Wafren RAJ ' AeberSO,d R " J " ■** Chem. 266:15621-15625 
[1 886] 787. Fructose-bisphosphate aldolase class-ll signatures 

[1 888] This family also includes the following proteins: 

Consensus pattern[LIVM]-E-x-E^LIVM]-G-x(2)-[GMJ-[GSTA]-x-E 

1 1] Perham R.N. Biochem. Soc. Trans. 18:185-187(1990). 

[ 2] Marsh J.J., Lebherz H.G. Trends Biochem. Sci. 17- 11 0-1 13(1 992) 

[3] von (tor Oaten C.K, Barbas C.F. III, Wong C.-H.. Sinskey AJ. Mol. Microbiol. 3- 1625-1 637(1 989> 
[ 4] Berry A. Marshall K.E. FEBS Lett 318:11-16(1993). . 1637(1989). 

[1891] 788. Prolyl oligopeptidase family serine active site 

degree of sequence conservation between these sequences yropniia), mere is a high 

• Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) (gene- 0AP2) 

Ml Rawlings N.D.. Polgar L, Barrett A.J. Biochem. J. 279:907-911(1991) 
o o , rfett A J ' Rawlin 9 s N 0 Biol. Chem. Hoppe-Seyler 373:353-360(1992) 
I 3] Polgar L, Szabo E. Biol. Chem. Hoppe-Seyler 373 361 -366(1 992) 
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terminal extremity. The mammalian enzyme differs from the bacterial or yeast proteins by having an EF-hand calcium- 
binding region (See <PDOC00018>) in its C-terminal extremity. 

[1 922] Two signature patterns were developed. One based on the first half of the FAD-binding domain and one which 
corresponds to a conserved region in the central part of these enzymes. 
[1923] Consensus pattern[IV}-G-G-G-x(2)-G-tSTACv>G-x-A-x-D-x(3)-R-G 
Consensus pattemG-G-K-x(2)-[GSTE]-Y-R-x(2)-A 

[ 1] Austin D., Larson T.J. J. Bacterid. 173:101-107(1991). 

[ 2] Roennow B.. Kielland-Brandt M.C. Yeast 9:1121-1130(1993). 

( 3] Brown L.J., McDonald M.J., Lehn D.A., Moran S.M. J. Biol. Chem. 269:14363-14366(1994). 

[1924] 794. NOL1/NOP2/sun family signature 

[1925] The following proteins seems to be evolutionary related: 

- Mammalian prolrferating-cell nucleolar antigen p120 (gene NOL1) which may play a role in the regulation of the 
cell cycle and the increased nucleolar activity that is associated with the cell proliferation. 

- Yeast nucleolar protein NOP2 (or YN A1 ) which could be involved in nucleolar function during the onset of growth, 
and in the maintenance of nucleolar structure. 

Yeast hypothetical protein YBL024w. 
Bacterial protein sun (also known as fmu). 
Escherichia coli hypothetical protein yebU. 

- Mycobacterium tuberculosis hypothetical protein MtCY2l B4.24. 
Methanococcus jannaschii hypothetical protein MJ0026. 

NOL1 is a protein of 855 residues, NOP2 consists of 618 residues, YBL024w of 684, sun is a protein of about 430 to 
450 residues and MJ026 has 274 residues. They share a conserved central domain which contains some highly con- 
served regions. One of these regions was selected as a signature pattern. 
[1 926] Consensus pattem[FV]-D-{KRA]-[LI VMA]-L-x-D-[AV>P-C-[ST]-[G A] 
[1927] 795. moaA / nif B / pqqE family signature 

[1 928] A number of proteins involved in the biosynthesis of metallo cofactors have been shown [1 ,2] to be evolutionary 
related. These proteins are: 

- Bacterial and archebacterial protein moaA, which is involved in the biosynthesis of the molybdenum cofactor (mo- 
lybdopterin; MPT). 

- Arabidopsis thaliana cnx2, a protein involved in mofybdopterin biosynthesis and which is highlys similar to moaA. 

- Bacillus subtilis narA, which seems to be the moaA ortholog in that bacteria. 

- Bacterial protein nifB (or fixZ) which is involved in the biosynthesis of the nitrogenase iron-molybdenum cofactor. 

- Bacterial protein pqqE which is involved in the biosynthesis of the cofactor pyrrolo-quinoline-quinone (PQQ). 

- Pyrococcus furiosus cmo, a protein involved in the synthesis of a molybdopterin-based tungsten cofactor. 
Caenorhabditis elegans hypothetical protein F49E2. 1 . 

[1929] All these proteins share, in their N-terminal region, a conserved domain that contains three cysteines. In 
moaA, these cysteines have been shown [1] to be important for the biological activity. They could be inotved in the 
binding of an iron-sulfur cluster. 

[1930] Consensus pattem[LIV]-x(3)-C-[NP]-[LIVMF]-[QRS]-C-x.[FYM]-C [The three C's are putative Fe-S ligands 

[ 1) Menendez C. t Igloi G., Henninger H, Brandsch R. Arch. Microbiol. 164:142-151(1995). 
{ 2] Hoff T. ( Schnorr K.M., Meyer C, Caboche M. J. Biol. Chem. 270:6100-6107(1995). 

[1931] 796. Forkhead-associated (FHA) domain profile 

[1932] The forkhead-associated (FHA) domain [1.E1J is a putative nuclear signalling domain found in a variety of 
otherwise unrelated proteins. The FHA domain comprise approximately 55 to 75 amino acids and contains three highly 
conserved blocks separated by divergent spacer regions. Currently it has been found in the following proteins: 

- Four transcription factors that also contain a forkhead (FH) domain: mouse myocyte nuclear factor 1 (MNF1 ), yeast 
transcription factor FHL1, which probably controls pre-mRNA processing, and yeast FKH1 and FKH2. In those 
protein the FHA domain is located N-terminal of the DNA-binding FH domain. 

- Kinase-associaled protein phosphatase (KAPP) from Arabidopsis thaliana, a protein which specifically interacts 
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Consensus pattem{GEHSAJ-x-IUVM](2)-D-{UVM]-G-[GP]-x(2)-[STA]-x-P 

[ 1j Slock J., Stanly D.P., Han C.-Y., Six E.W.. Crawford I.P. J. BacterioL 172:7211-7226(1990) 

[ 2J Vblpes F., Dyer M., Scaife J.G., Darby G., Stammers D.K.. Delves C.J. Gene 1 12:213-218(1992). 

[1913] 792. Phosphatidylinositol 3- and 4-kinases signatures 

[1914] Phosphatidylinositol 3-kinase (PI3-kinase) (EC 2.7.1.137) [1] is an enzyme that phosphorates ohosohoi 

r J £22 and / , - 3 ^; 5 - p J 3 ) - 18 «* y« known, although it is proposed that they function as second messenqers in 
cell signalling. Currently, three forms of PI3-kinase are known: messengers in 

- The mammalian enzyme which is a heterodimer of a 110 Kd catalytic chain (p110) and an 85 Kd subunit (d85) 

■ Yeast TOR1/DRR1 and TOR2/DRR2 (2], PI3-kinases required for cell cycle activation. Both are proteins of about 

" I^L^k 13 !' 8 R3 -^ inas ! ipn * nd m vacoola ' a "°" segregation. VPS34 is a protein of about 100 Kd 

- Arabidopsis thaliana and soybean VPS34 homologs. 

!pm?,hrS^ Phatidy !! n f m 4 " KinaSe < PI4 - kinase ) < EC 2 - 7 - 1 - 6 7) W « an enzyme that acts on phosphatidylinositol 

- Human PI4-kinase alpha 

- Yeast PIK1 , a nuclear protein of 1 20 Kd. 
Yeast STT4, a protein of 214 Kd. 

!1 9 hS J"* T'L and Pl4 - kinases share a we " conserved domain at their C-terminal section; this domain seems to 

rr e dCs^ 

[1917] Four additional proteins belong to this family: 

- Mammalian FKBP-rapamycin associated protein (FRAP) [5], which acts as the target for the celkycle arrest and 
immunosuppressive effects of the FKBP1 2-rapamycin complex ceiKycle arrest and 

" P r!T U ^ [6 J ^ iCh * fequired tor ce " 9rowthl DNA re P air and mei0 «c recombination. 

• Yeast protein TEL1 which is involved in controlling telomere length 

- Yeast hypothetical protein YHR099w, a distantly related member of this family. 

- Fission yeast hypothetical protein SpAC22E12.16C. 

[1918] Consensus pattem[LI\^FAC]-K-x(1,3)-[DEAHDE]-[UVMC]-R-Q-IDE]-x(4)-Q 
Consensus pattern[GShx-|AV]-x(3)-[LIVM]-x(2HFYH]-ILIVM](2)-x-[LIVMFl-x-D.R-H-x(2)-N 

[ 1] Hiles I.D. Otsu M.. Volinia S.. Fry M.J., Gout I.. Dhand R., Panayotou G.. Ruiz-Larrea F . Thompson A. Tottv 
N.F., Hsuan J.J., Courtneidge S.A., Parker P.J.. Waterfield M.D. Cell 70 419-429(1992) m0mpSOn JOtt * 
! 3 c"k Z n;, He T n T eZ R * Schneider U - Deuter-Reinhard M., Mowa N.. Hall M.N. Cell 73 585-596(1993) 

3 !° nu R V- Takegawa K.. Fry M.J., Stack J.H.. Waterfield M.D., Emr S.D. Science 260 88 91(1993) 

4 Garca-Bustos J R. Marini F.. Stevenson I., Frei C, Hall M.N. EMBO J. 13:2352-2361(1994) 

[51 Brown E.J.. Albers M.W., Shin T.B.. Ichikawa K., Keith C.T.. Lane W.S., Schreiber S.L Nature 369:756-758 

[ 6] Kato R., Ogawa H. Nucleic Acids Res. 22:3104-3112(1994). 
[1919] 793. FAD-dependent glycerol-3-phosphate dehydrogenase signatures 

erof ?oho^n, n o h k 9 lyCero| - 3 P hos P hale dehydrogenase (EC 1 . 1 .99.5) (GPD) catalyzes the conversion of glyc. 
eroW-phosphateintodihydroxyacetone phosphate. In bacteria [1 Jit is associated wim the utilization of glycerol couoied 
to respirat.on. In Escherichia co.i. two isozymes are known: one expressed under anaerobic conditionsTene q K 

TcZJ" TT COndi,i ° nS (96ne 9 ' PD) eUka,y0,es - 3 ^nondria. form of GPD part^ipat^n he glcero 

phosphate shuttle in conjunction with an NAD<lependent cytoplasmic GPD (EC 1 1 1 8) [2 3] 

[1921] These enzymes are proteins of about 60 to 70 Kd which contain a probable FAD-binding domain in their N- 
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Number of members: 581 
[1946] 

[1] Dyda F, Hickman AB, Jenkins TM, Engeiman A, Craigie R, Davies DR; Medline: 95099322. Crystal structure 
of the catalytic domain of HIV-1 integrase: similarity to other polynucleotidyl transferases" Science 1994*266* 
1981-1986. 

[2] Lodi PJ, Ernst JA, Kuszewski J, Hickman AB, Engeiman A, Craigie R, Ctore GM, Gronenbom AM; Medline: 
95359147; Solution structure of the DNA binding domain of HIV-1 integrase.* Biochemistry 1995;34:9826-9833 

[1947] 802. Iig_chan 
Ligand-gated ion channel 

[1948] This family includes the four transmembrane regions of the ionotropic gtutamate receptors and NMD A recep- 
tors. 

Number of members: 128 

[1949] [1] Tong G, Shepherd D, Jahr CE; Medline: 95184014; Synaptic desensitization of NMDA receptors by cal- 
cineurin." Science 1995;267:1510-1512. 
[1950] 803. RhoGAP 
RhoGAP domain 

[1951] GTPase activator proteins towards Rho/Rac/Cdc42-like small GTPases. 

Number of members: 97 

[1952] 

[1] Musacchio A, Cantley LC, Harrison SC; Medline: 97121392; Crystal structure of the breakpoint cluster region- 
homology domain from phosphoinositide 3-kinase p85 alpha subunit.* Proc Natl Acad Sci U S A 1996*93* 
14373-14378. 

[2] Barrett T, Xiao B, Dodson EJ, Dodson G, Ludbrook SB, Nurmahomed K, Gamblin SJ, Musacchio A, Smerdon 
SJ, Eccleston JF; Medline: 97162209; The structure of the GTPase-activating domain from pSOrhoGAJR" Nature 
1997;385:458-461. 

[3] Rittinger K, Walker PA, Eccleston JF, Nurmahomed K, Owen D, Laue E, Gamblin SJ. Smerdon SJ; Medline: 
97404320; Crystal structure of a small G protein in complex with the GTPase-activating protein rhoGAP." Nature 
1997;388:693-697. 

[4] Boguski MS, McCormick F; Medline: 94081948; Proteins regulating Ras and its relatives* Nature 1993 366* 
643-654. 

[1953] 804. vwd 

von Willebrand factor type D domain 

[1954] [1] Bork P; Medline: 93327926; The modular architecture of a new family of growth regulators related to 
connective tissue growth factor.* FEBS lett 1993;327:125-130. 

Number of members: 92 

[1955] 805. zf-C4_Topoisom 
Topoisomerase DNA binding C4 zinc finger 

[1] Tse-Dinh YC. Beran-Steed RK; Medline: 89034032; Escherichia coli DNA topoisomerase I is a zinc metallo- 
protein with three repetitive zinc-binding domains." J Biol Chem 1988;263:15857-15859. 
[2] Ahumada A, Tse-Dinh YC; Medline: 99011409; The Zn(ll) binding motifs of E. coli DNA topoisomerase I is part 
of a high-affinity DNA binding domain." Bkxhem Biophys Res Commun 1998;251:509-514. 

Number of members: 51 



[1956] 806. AIRC 
AIR carboxylase 
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^^^^^^^^^^ 

- Human nuclear antigen Ki67 which is expressed only in proliferating cells ' 

- Yeast hypothec protein YHR115c. which contains a RING-finger C-terminal of the FHA rin™ 

• Caenorhabditis elegans hypothetical protein ZK632.2. 

- Caenorhabditis elegans hypothetical protein C01 G6 5 

- MOrTZIT* TT 6 An c abaena ' "** C ° ntainS a moW N -terminal of the FHA domain 

ro£rhe\^ 

[ 1] Hofmann K.O.. Bucher P. Trends Biochem. Sci. 20:347-349(1995) 

3 - A - ' ?"T = A ' Smi,h R °- 8001 MA - WWtar J-C Science 266:793-795(1 994* 

1 3] .tevas ..A.. Zhou Z, Elledge S.J. Cell 80:29-39(1995). 1 " 

[1933J 797. Ald_Xan_dh_C 

mo^ 9 ,? ^ ^ Xanthine den y d rogenase, C terminus 

Number of members: 54 

[1935] 798. Glyco_hydro_38 
Glycosyl hydrolases family 38 

[1936] Glycosyl hydrolases are key enzymes of carbohydrate metabolism. 
Number of members: 20 

K STST ^ Med " ne: 983134241 GVCOSidaSe familieS " Bi0Chem S0C Trans 1998;26:153-156. 
HECT-domain (ubiquitin-transferase). 

[1 939] The name HECT comes from Homologous to the E6-AP Carboxyl Terminus. 
Number of members: 43 

[1940] [1]Huibre g tseJM,ScheffnerM,BeaudenonS,HowleyPM- Medline- 95?2<*>fii a k , 
turally and functionally related to the E6-AP ubin.,itin-™t^ 7 . ' Medl!ne 95223 981 , A family of proteins struc- 
[1941] 800. HRDC ub^umn-protein Kgase.' Proc Natl Acad Sci U S A 1995;92:2563-2567. 

HRDC domain 



Number of members: 19 



tjQl. Integrase 

-^e^ - - — mtegrase is 

domain m .The carboxyl ,erm,na. toZXlZ^^T ^ Cemra ' * *" 



ORQ 
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[2] Tamkun J.W., Deuring R, Scott M.P., Kissinger M., Pattatucci A.M., Kaufman T.C., Kennison JA Cell 68* 
561-572(1992). 

[ 3] Tamkun J.W. Curr. Opin. Genet. Dev. 5:473-477(1995). 

[19G3] 608. (CH) Actinin-type actin-binding domain signatures 
PROSITE cross-reference(s): PS00019; ACTININ.1, PS00020; ACTININ.2 

[1964] Alpha-actinin is a F-actin cross-linking protein which is thought to anchoractin to a variety of intracellular 
structures [1]. The actin-binding domain of alpha-actinin seems to reside in the first 250 residues of the protein. A 
similar actin-binding domain has been found in the N-terminal region of many different actin-binding proteins [2,3]: 

In the beta chain of spectrin (or fodrin). 

In dystrophin, the protein defective in Duchenne muscular dystrophy (DMD) and which may play a role in anchoring 
the cyt ©skeleton to the plasma membrane. 
In the slime mold gelation factor (or ABP-1 20). 

In actin-binding protein ABP-280 (or filamin), a protein that link actin filaments to membrane glycoproteins. 

In fimbrin (or plastin), an actin-bundling protein. Fimbrin differs from the above proteins in that it contains two 

tandem copies of the actin-binding domain and that these copies are located in the C-terminal part of the protein. 

[1965] Two conserved regions were selected as signature patterns for this type of main. The first of this region is 
located at the beginning of the domain, hile the second one is located in the central section and has been shown to 
be essential for the binding of actin. 

[1 966] Consensus pattem[EQ]-x(2)-[ATVHFY]-x(2)-W-x-N 

Consensus pattem[LIVM]-x- [SGN]-{LIVM]-[DAGHEHSAG]-x-[DNEAG]-[LIVM]-x-[DEAG]-x(4)-[LIVM]-x*[LM]-[SAG]- 
[LIVMJ-[LIVMT]-W-x- [LIVM](2) 

[ 1] Schleicher M., Andre E., Harmann A. ( Noegel A. A. Dev. Genet. 9:521-530(1988). 
[ 2] Matsudaira P. Trends Biochem. Sci. 16:87-92(1991). 
[ 3] Dubreuil RR BioEssays 13:219-226(1991). 

[1967] 809. (COX1) Heme-copper oxidase subunit I, copper B binding region signature PROSITE cross-reference 
(s): PS00077; COX1 

Heme-copper respiratory oxidases [1] are oligomeric integral membrane protein complexes that catalyze the terminal 
step in the respiratory chain: they transfer electrons from cytochrome c or a quinol to oxygen. Some terminal oxidases 
generate a transmembrane proton gradient across the plasma membrane (prokaryotes) or the mitochondrial inner 
membrane (eukaryotes). The enzyme complex consists of 3-4 subunits (prokaryotes) up to 1 3 polypeptides (mammals) 
of which only the catalytic subunit (equivalent to mammalian subunit 1 (CO I)) is found in all heme-copper respiratory 
oxidases. The presence of a bimetallic center (formed by a high-spin heme and copper B) as well as a low-spin heme, 
both ligated to six conserved histidine residues near the outer side of four transmembrane spans within CO I is common 
to all family members [2-4]. 

[1968] In contrary to eukaryotes the respiratory chain of prokaryotes is branched to multiple terminal oxidases. The 
enzyme complexes vary in heme and copper composition, substrate type and substrate affinity. The different respiratory 
oxidases allow the cells to customize their respiratory systems according a variety of environmental growth conditions 
11]- 

[1 969] Recently also a component of an anaerobic respiratory chain has been found to contain the copper B binding 
signature of this family: nitric oxide reductase (NOR) exists in denitrifying species of Archae and Eubacteria. 
[1970] Enzymes that belong to this family are: 

- Mitochondrial-type cytochrome c oxidase (EC 1 .9.3. 1 ) which uses cytochrome c as electron donor. The electrons 
are transferred via copper A (Cu(A)) and heme a to the bimetallic center of CO I that is formed by a penta-coor- 
dinated heme a and copper B (Cu(B)). Subunit 1 contains 12 transmembrane regions. Cu(B) is said to be ligated 
to three of the conserved histidine residues within the transmembrane segments 6 and 7. 
Quinol oxidase from prokaryotes that transfers electrons from a quinol to the binuclear center of polypeptide I. This 
category of enzymes includes Escherichia coli cytochrome O terminal oxidase complex which is a component of 
the aerobic respiratory chain that predominates when cells are grown at high aeration. 

FixN, the catalytic subunit of a cytochrome c oxidase expressed in nitrogen-fixing bacteroids living in root nodules. 
The high affinity for oxygen allows oxidative phosphorylation under low oxygen concentrations. A similar enzyme 
has been found in other purple bacteria. 

Nitric oxide reductase (EC 1.7.99.7) from Pseudomonas stutzeri. NOR reduces nitrate to dinitrogen. It is a het- 
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this domain. Number of members: 35 ^^esrs. Some members of the fam.ly contain two copies of 

[1 957] 807. Bromodomain signature and profile 

BRSSS5 rence(s): PS00633: BROMODOMA,N - 1 - ps *»k 

The bromodomain [1,2.3] a conserved region of about 70 amino acids found in the follow™, proteins: 

of the cell cycle. dm9 Pt ° tem a " d SeemS essent,al ,or Progression of the Gl phase 

" m^™ N °L\ P ^ locus 

BRG1 . three brahma - ,lk8 P«*«» are known: SNF2a(hBRM). SNF2b, and 

- Yeast NPS1/STH1. involved in G(2) phase control in mitosis 

- Yeast SNF2/SWI2, which is part of a complex whh the SNF5, SNF6 SWI3 and ADR6/<?Wn . • T 
complex is involved in transcriptional activation ADR6/SW11 prote.ns. This SWI- 

- ^iP^' abator of Ty elements and possibly other genes 
Caenorhabdrtis elegans protein cbp-1. 

- Yeast hypothetical protein YGFt056w. 

- Yeast hypothetical protein YKFt008w. 

- Yeast hypothetical protein L9638. 1 . 

» ^b,. ^ ^ m „ a ^ «,^ spa r«, a JZSS b " > ™ ao ™' : • ™» —*» 
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[1981] Signature patterns were developed for both conserved regions. 

[1982] Consensus pattem[EDQH]-x-K-x-[DN]-G-x-R-[GACIVM] [K is the active site residue] 

[1 983] Consensus pattemE-G4LIVMAHUVM](2)-{KR]-x(5,8HYW]-[QN^^ 
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[1985] FAD-dependent glycerol-3-phosphate dehydrogenase (EC 1 .1 .99.5) (GPD) catalyzes the conversion of glyc- 
erol-3-phosphate into dihydroxyacetone phosphate. In bacteria [1 ] it is associated with the utilization of glycerol coupled 
to respiration. In Escherichia coli, two isozymes are known: one expressed under anaerobic conditions (gene glpA) 
and one in aerobic conditions (gene glpD). In eukaryotes, a mitochondrial form of GPD participates in the glycerol 
phosphate shuttle in conjunction with an NAD-dependent cytoplasmic GPD (EC 1.1.1.8) [2, 3]. 
[1986] These enzymes are proteins of about 60 to 70 Kd which contain a probable FAD-binding domain in their N- 
terminal extremity. The mammalian enzyme differs from the bacterial or yeast proteins by having an EF-hand calcium- 
binding region (See <PDOC00018>) in its C-terminal extremity. 

[1 987] Two signature patterns were developed. One based on the first half of the FAD-binding domain and one which 
corresponds to a conserved region in the central part of these enzymes. 
[1 988] Consensus pattemfl V]-G-G-G-x(2)-G-[STACV>G-x-A-x-D-x(3)-R-G 
Consensus patternG-G-K-x(2)-{GSTE]-Y-R-x(2)-A 
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[1989] 813. (Fapy_DNA_glyco) Formamidopyrimidine-DNA glycosylase signature PROSITE cross-reference(s) 
PS01242; FPG 

[1990] Formamidopyrimidine-DNA glycosylase (EC 3.2.2.23) [1] (Fapy-DNA glycosylase) (gene fpg) is a bacterial 
enzyme involved in DNA repair and which excise oxidized purine bases to release 2,6-diamino-4-hydroxy-5N-methyl- 
formamidopyrimidine (Fapy) and 7,8-dihydro-8-oxoguanine (8-OxoG) residues. In addition to its glycosylase activity, 
FPG can also nick DNA at apurinic/apyrimidinic sites (AP sites). FPG is a monomeric protein of about 32 Kd which 
binds and require zinc for its activity. 

[1991] The binding site for zinc seems to be located in the C-terminal part of the enzyme where fours conserved and 

essential [2] cysteines are located. A signature pattern was developed based on this region. 
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IKthiacT*™ nE - CodaM - w "">' ! DE ■■ JF.. d^ 8 
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811. (DNAJigase) ATP-dependent DNA ligase signatures 

S1, C nT^1T?? ): DNA - UGAS E-A1. PS00333; DNAJJGASE A2 
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Consensus patternL-x-F-L-H-x-Y-H-H 
[11 
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[2006] 81 7. Immunoglobulins and major histocompatibility complex proteins signature PROSITE cross-referencefsV 
PS00290; IG_MHC v '* 

[2007] The basic structure of immunoglobulin (Ig) [1 ] molecules is a tetramer of two light chains and two heavy chains 
linked by disulfide bonds. There are two types of light chains: kappa and lambda, each composed of a constant domain 
(CL) and a variable domain (VL). There are five types of heavy chains: alpha, delta, epsilon, gamma and mu, all 
consisting of a variable domain ( VH) and three (in alpha, delta and gamma) or four (in epsilon and mu) constant domains 
(CH1 toCH4). 

[2008] The major histocompatibility complex (MHC) molecules are made of two chains. In class I [2] the alpha chain 
is composed of three extracellular domains, a transmembrane region and a cytoplasmic tail. The beta chain (beta- 
2-microglobulin) is composed of a single extracellular domain. In class II [3], both the alpha and the beta chains are 
composed of two extracellular domains, a transmembrane region and a cytoplasmic tail. 

[2009] It is known [4,5] that the Ig constant chain domains and a single extracellular domain in each type of MHC 
chains are related. These homologous domains are approximately one hundred amino acids long and include a con- 
served intradomain disulfide bond. A small pattern around the C-terminal cysteine is involved in this disulfide bond 
which can be used to detect these category of Ig related proteins. 

[2010] Consensus pattern[FY]-x-C-x-[VA]-x-H-Sequences known to belong to this class detected by the pattern: Ig 
heavy chains type Alpha C region : All, in CH2 and CH3. Ig heavy chains type Delta C region : All, in CH3. Ig heavy 
chains type Epsilon C region: All, in CH1, CH3 and CH4. Ig heavy chains type Gamma C region : All. in CH3 and also 
CH1 in some cases Ig heavy chains type Mu C region : All, in CH2, CH3 and CH4. Ig light chains type Kappa C region : 
In all CL except rabbit and Xenopus. Ig light chains type Lambda C region : In all CL except rabbit. MHC class I alpha 
chains : 

AN, in alpha-3 domains, including in the cytomegalovirus MHC-1 homologous protein [6]. Beta-2-microglobulin : All. 
MHC class II alpha chains: All, in alpha-2 domains. MHC class II beta chains: All, in beta-2 domains. 
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[2011] 818. (IGFBP) Insulin-like growth factor binding proteins signature PROSITE cross-referencefsV PS00222 
IGF_BINDING 

[2012] The insulin-like growth factors (IGF-I and IGF-II) bind to specific binding proteins in extracellular fluids with 
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Tate S.S.. Meister A. 
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Suzuki H.. Kumagai H., EchigoT., Tochikura T. 

J. Bacterid 171:5169-5172(1989) 
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Ishiye M, Niwa M. 

Bicchim. Biophys. Acta 1132:233-239(1992). 



[1 998] 815. G-protein gamma subunit profile 
mo^' 1 ! 0 ' 088 *' 6 ' 6 ' 6 " 08 ^^ PS5005 8; G_PROTEIN_GAMMA 

least 12 different isoforms of gamma sLnits * C-tenm.nus. In mammals there are at 

Slile^r^" 8 e ' e9anS Pr0,8in e9, ' 1 °- Whfch iS 3 <>< G -P^" waning, contains a G-prote* 

[2002] A profile was developed that spans the complete length of the gamma subunit. 

Pennington S.R. 

Protein Prof. 2:16-315(1995). 

[2003] 816. GNS1/SUR4 family signature 

PROSITE cross-reference(s): PS01188; GNS1_SUR4 

Sh£^ 

- Yeast GNS1 [2], a protein involved in synthesis of 1 ,3-beta-glucan 

- Yeast hypothetical protein YJL1 96c. 

- Caenorhabdltis elegans hypothetical protein C40H1 A 

- Caenorhabdltis elegans hypothetical protein D2024.3. 

i^im^rcr^ 

from one to three .ransme^™^ 3 C — «*» ** cent*, 

nature pattern. This region is located in Z ! hyd^ ^ ^ 35 a * 
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- GTP-binding elongation factors (EF-Tu, EF-1 alpha, EF-G, EF-2 t etc.). 
Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral, Ypt1, SEC4, etc.). 
Nuclear protein ran (see <PDOC00859>). 
ADP-ribosylation factors family (see <PDOC0078t >). 
Bacterial dnaA protein (see <PDOC00771 >). 
Bacterial recA protein (see <PDOC001 31 >). 
Bacterial recF protein (see <PDOC00539>). 

Guanine nucleotide-binding proteins alpha subunits (Gi, Gs, Gt, GO, etc.). 
DNA mismatch repair proteins mutS family (See <PDOC00388>). 
Bacterial type It secretion system protein E (see <PDOC00567>). 

[2021] Not all ATP- or GTP-binding proteins are picked-up by this motif. A number of proteins escape detection 
because the structure of their ATP-binding site is completely different from that of the P-loop. Examples of such proteins 
are the E1-E2 ATPases or the glycolytic kinases. In other ATP- or GTP-binding proteins the flexible loop exists in a 
slightly different form; this is the case for tubulins or protein kinases. A special mention must be reserved for adenylate 
kinase, in which there is a single deviation from the P-loop pattern: in the last position Gly is found instead of Ser or Thr. 
[2022] Consensus pattem[AG]-x(4)-G-K-[ST] 
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EMBO J. 1:945-951(1982). 

[2] 

Moller W., Amons R. 
FEBS Lett. 186:1-7(1985). 
[3] 

Fry D.C., Kuby S.A., MikJvan A.S. 

Proc. Natl. Acad. ScL U.S.A. 83:907-911(1986) 

[4] 

Dever T.E., Glynias M.J., Merrick W.C. 

Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987) 

15] 

Saraste M., Sibbald PR. ( Wittinghofer A. 
Trends Biochem. Sci. 15:430-434(1990) 
[6] 

Keen in E.V. 

J. Mol. Biol. 229:1165-1174(1993). 
[7] 

Higgins C.F., Hyde S.C., Mimmack M.M., Gileadi U., Gill D.R., Gallagher M.P. 

J. Bioenerg. Biomembr. 22:571-592(1990). 

[8] 

Hodgman T.C. 

Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata) 
[9] 

Under P., Lasko P, Ashbumer M., Leroy P., Nielsen P.J., Nishi K. ( 
Schnier J., Slonimski P.P. 
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[2023] 821. PE: PE family 

This family named after a PE motif near to the amino terminus of the domain. The PE family of proteins all contain an 
amino-terminal region of about 110 amino acids. The carboxyl terminus of this family are variable and fall into several 
classes. The largest class of PE proteins is the highly repetitive PGRS class which have a high glycine content. The 
function of these proteins is uncertain but it has been suggested that they may be related to antigenic variation of 
Mycobacterium tuberculosis (1]. Number of members: 88 

[2024] [1 ] Medline: 98295987. Deciphering the biology of Mycobacterium tuberculosis from the complete genome 
sequence. Cole ST, Brosch R, Parkhill J, Gamier T. Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE 
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inhibit or stimulate the growth promoting effects of the IGFs on cells culture Thev seem to a^hTi!?^ T J,^ 

- Mouse protein cyrol and its probable chicken homolog, protein CEF-1 0 

• Human connective tissue growth factor (CTGF) and its mouse homolog, protein FISP-12 

- vertebrate protein NOV. 

[2014] As a signature pattern a conserved cysteine-rich region located* the N-terminal section of these proteins is 
[2015] Consensus pattemG-C-[GS]-C-C-x(2)-C-A-x(6)-C 

Sequences known to belong to this class detected by the patternALL. except for IGFBP-6's. 
[1] 

Rechler M.M. 

Vitam. Horm. 47:1-114(1993) 
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Shimasaki S., Ling N. 

Prog. Growth Factor Res. 3:243-266(19911 
[3] h 
Clemmons O.R. 

Trends Endocrinol. Metab. 1 412-417(19901 

[4] } ' 

Bradham D.M., Igarashi A., Potter R.L, Grotendorst G R 

J. Cell Biol. 114:1285-1294(1991) 

[5] 

Maloisel V Martinerie C, Dambrine G., Plassiart G., Brisac M., Crochet 
J., Perbal B. 

Mol. Cell. Biol. 1 2: 1 0-21 (1 992). 



Lf^L ( m y° si "- he ad) ATP/GTP-binding site motif A (P-loop) 
PFtOSITE cross-reference(s): PS00017- ATP_GTP_A 

^====^rr====sr; — 

- ATP synthase alpha and beta subunits (see <PDOC001 37>) 
Myosin heavy chains. 

- Kinesin heavy chains and kinesin-like proteins (see <PDOC00343>) 

- Dynamins and dynamin-like proteins (see <PDOC00362>) 

- Guanylate kinase (see <PDOC00670>). 

- Thymidine kinase (see <PDOC00524>). 

- Thymidylate kinase (see <PDOC01034>). 
Shikimate kinase (see <PDOC00868>). 

- Nitrogenase iron protein family (nifH/frxC) (see <PDOC00580>) 

' Z!^S^^ ^ <ABC PI <POOC00 18 5>, 
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Mammalian Ras GTPase-activating protein (GAP). 

Adaptor proteins mediating binding of guanine nucleotide exchange factors to growth factor receptors: vertebrate 
GRB2, Caenorhabditis elegans sem-5 and Drosophila DRK. 

Mammalian Vav oncoprotein, a guanine-nucleotide exchange factor of the CDC24 family. 

- Miscellanous proteins interacting with vertebrate receptor protein tyrosine kinases: oncoprotein Crk, mammalian 
cytoplasmic proteins Nek, She. 

- STAT proteins (signal transducers and activators of transcription). 
Chicken tensin. 

Yeast transcriptional control protein SPT6. 

[2033] The profile developed to detect SH2 domains is based on a structural alignment consisting of 8 gap-free 
blocks and 7 linker regions totaling 92 match positions. 

[1] 

Sadowski I., Stone J.C., Pawson T 
Mol. Cell. Biol. 6:4396-4408(1986). 
[2] 

Russel R.B., Breed J., Barton G.J. 
FEBS Lett. 304:15-20(1992). 
[3] 

Marangere L.E.M., Pawson T. 

J. Ceil ScL Suppl. 18:97-104(1994). 

[4] 

Pawson T, Schlessinger J. 
Curr. Biol. 3:434-442(1993). 
[5] 

Mayer B.J., Baltimore D. 
Trends Cell. Biol. 3:8-13(1993). 
[6] 

Pawson T. 

Nature 373:573-580(1995). 
[7] 

Kuriyan J., Cowburn D. 

Curr. Opin. Struct. Biol. 3:828-837(1993). 

[2034] 824. Sulfate transporters signature 

PROSITE cross-reference(s): PS01130; SULFATE_TRANSP 

[2035] A number of proteins involved in the transport of sulfate across a membrane as well as some yet uncharac- 
terized proteins have been shown [1 ,2] to be evolutionary related. These proteins are: 

Neurospora crassa sulfate permease II (gene cys-14). 

- Yeast sulfate permeases (genes SUL1 and SUL2). 

- Rat sulfate anion transporter 1 (SAT-1). 

- Mammalian DTD ST, a probable sulfate transporter which, in Human, is involved in the genetic disease, diastrophic 
dysplasia (DTD). 

- Sulfate transporters 1 , 2 and 3 from the legume Stylosanthes hamata. 

- Human pendrin (gene PDS), which is involved in a number of hearing loss genetic diseases. 
Human protein DRA (Down -Regulated in Adenoma). 

Soybean early nodulin 70. 

Escherichia coli hypothetical protein ychM. 

Caenorhabditis elegans hypothetical protein F41 D9.5. 

[2036] As expected by their transport function, these proteins are highly hydrophobic and seem to contain about 1 2 
transmembrane domains. The best conserved region seems to be located in the second transmembrane region and 
is used as a signature pattern. 

[2037] Consensus panem[PAvl-x-Y-[GSJ-L-Y-[STAG](2)-x(4)-[LIVFYA]-[LIVST]-[YI]-x(3)-[GAHGST]-S-[KR] 
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12025] 822. (RNB) Ribonuclease II family signature 
PROSITE cfoss-reference(s): PS01175; RIBONUCLEASEJI 

[2026] On the bas« of sequence similes. the .Cowing baCeria. and eukaryotic protefcs seem to form a famir,: 

srveryintheS'toS-direc"^ 
' £2 P T in n S ,!o! <0f SRK1) *** * imp,ica,ed h ,he control <* ^e cell cycle G1 phase 

- Caenorhabcffiis elegans hypothetical protein F48E8.6. 

Sen, m^are "!^S^S^^ t T ^ * 1250 < SSD1 > ** ^uence is high* 
a putative exlclease^l^ " T* 8 *' 0 that < his *— ' «**• a role J 

on the core of this conserved domain " ,0 a " * e8e Pr ° ,e ' nS - S '9 na,ure P*«™ was developed based 

11] 

Zilhao R., Camelo L, Arraiano CM. 
Mol. Microbiol. 8:43-51(1993) 
[2] 

Noguchi E Hayashi N., Azuma Y.. Seki T.. Nakamura M.. Nakashima N 
Yanag,da M., He X., Mueller U.. Sazer S., Nishimoto T. 
EMBOJ. 15:5595-5605(1996) 
[3J 

Beuf L. Bedu S., Cami B., Joset F. 
Plant Mol. Biol. 27:779-786(1995) 
14] 

Mian I S. 

Nucleic Acids Res. 25:3187-3195(1997). 

[2029] 823. Src homology 2 (SH2) domain profile 
PROSITE cross-reference(s): PS50001 SH2 

other WR^i^^nS^p^S^l"^™? f " W6fe la,er ,ound in "»* 

cascades by interacting with hiStSS to 11^^^^^*™*^ 
str« %P h J P hory b ti^ P ende^ ^ ^ tides in a sequence-specific and' 

[2032] So fa, SH2 domains ^^ZmJ^Z^s. * ^ bete - Shee ' S * 

• s~^^ t ^^r (n ™ ecep,or) — — * 

" SSKJSSS^^ - - - dor^n are 

- Mammalian phosphatidyl inosito. 3-kinase rectory p85 subunft <PDOC50007 » 

- Some vertebrate and invertebrate protein-tyrosine phosphatases.' 
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Bacillus subtilis hypothetical protein ypsC. 

Synechocystis strain PCC 6803 hypothetical protein slr0064. 

Methanococcus jannaschii hypothetical proteins MJ0438 and MJ0710. 

[2047] These are hydrophific proteins of from 40 Kd to about 80 Kd. They can be picked up in the database by the 
following pattern. 

[2048] Consensus pattemD-P-[LIVMFJ-C-G-IST]-G*x(3HLI]-E 
References: 

[2049] [ 1] Bairoch A. Unpublished observations (1997). 
[2050] 830. Uncharacterized protein family UPF0031 signatures 

PROSITE cross-reference(s): PS01049; UPF0031_1; PS01050; UPF0031_2The following uncharacterized proteins 
have been shown [1] to share regions of similarities: 

Yeast chromosome XI hypothetical protein YKL151c. 
Caenorhabditis elegans hypothetical protein R 107.2. 
Escherichia coli hypothetical protein yjeF. 
Bacillus subtilis hypothetical protein yxkO. 
Helicobacter pylori hypothetical protein HP 1 363. 
Mycobacterium tuberculosis hypothetical protein MtCY77.05c. 
Mycobacterium leprae hypothetical protein B229_C2_201. 
Synechocystis strain PCC 6803 hypothetical protein sII1433. 
Methanococcus jannaschii hypothetical protein MJ1 586. 

[2051] These are proteins of about 30 to 40 Kd whose central region is well conserved. TTiey can be picked up in 
the database by the following patterns. 

[2052] Consensus pattem[SAV^[IVWJ-[LVAHLIV]-G-[PNS]-G-L-lGP]-x-IDENQT] 
Consensus pattem[GA]-G-x-G4D-[W]-{LT]-{STA]-G-x-lLIVMJ 
[2053] 831.(ACOX) 
Acyl-CoA oxidase 

[2054] This is a family of Acy l-CoA oxidases EC: 1 . 3.3.6. Acy l-coA oxidase converts acy l-CoA into trans-2-enoyl-CoA 

in 

Number of members: 39 

[2055] [1] Hayashi H, De Bellis L, Yamaguchi K, Kato A, Hayashi M. Nishimura M; Medline: 98192624. Molecular 
characterization of a glyoxysomal long chain acyl-CoA oxidase that is synthesized as a precursor of higher molecular 
mass in pumpkin." J Biol Chem 1998;273:8301-8307. 
[2056] 832. (AICARFT J MPCHas) 
AlCARFT/IMPCHase bienzyme 

[2057] This is a family of bifunctional enzymes catalysing the last steps in de novo purine biosynthesis. The bifunc- 
tional enzyme is found in both prokaryotes and eukaryotes. The second last step is catalysed by 5-aminotmidazole- 
4-carboxamide ribonucleotide formyltransferase EC:2.1 .2.3 (AICARFT), this enzyme catalyses the formylation of AIC- 
AR with 10-formyl-tetrahydrofolate to yield FAICAR and tetrahydrofolate [1]. The last step is catalysed by IMP (Inosine 
monophosphate) cyclohydrolase EC:3.5.4.10 (IMPCHase). cyclizing FAICAR (5-formylaminoimidazole-4-carboxamide 
ribonucleotide) to IMP [1], 

Number of members: 22 

[2058] 

[1] Akira T, Komatsu M, Nango R, Tomooka A, Konaka K, Yamauchi M, Kitamura Y, Nomura S, Tsukamoto I; 
Medline: 97473523 Molecular cloning and expression of a rat cDNA encoding 5-aminoimidazole-4-carboxamide 
ribonucleotide formyltransf erase/1 MP cyclohydrolase" [published erratum appears in Gene 1998 Feb 27;208(2): 
337) Gene 1 997; 1 97:289-293. 

{2] Rayl EA, Moroson BA, Beardsley GP; Medline: 96147205 The human purH gene product, 5-aminoimidazole- 
4-carboxamide ribonucleotide formyltransf e rase/I MP cyclohydrolase. Cloning, sequencing, expression, purifica- 
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m 

Sandal N.N., Marcker K.A. 

Trends Biochem. Sci. 19:19-19(1994) 

[2] 

Smith F.W.. Hawkesford M J.. Prosser I.M., Clarkson D.T. 
Mol. Gen. Genet. 247:709-715(1995). 



[2038] 825. TYA: TYA transposon protein 

EZL J!' 97404699 C( y°- electron -"^oscopy structure of yeast Ty retrotransposon virus-like particles 

PamerKJ, Tichelaar W. Myers N. Bums NR. Butcher SJ, Kingsman AJ. Fuller SD. Sgj 

[2040] 826. AJdolaseJ! 

Class II Aldolase and Adducin N-terminal domain. 

References: 
[2041] 

[2042] 827. CBD_2 

-!- Two tryptophan residues are involved in cellulose binding 

-J- Cellulose binding domain found in bacteria. Number of members: 51 

References: 

[2044] 828. P 

References: 
[2045] 

» mJh? er Zl m ^ ra,i0n " P, °" Kex2 P, ° ,ease - GI "^kof P. Fuller RS; EMBO J 1994^3 SoS 

SI f r f ROgU ^ f0feS ° f ,he P ^ °' ,he «*MsMto Prohormone c^ve^^ Su A 

Martm S. UpKind G, LaMendola J, Steiner DF; J Biol Chem 1998;273:11107-11114. convertases - A. 

[2046] 829. Uncharacterized protein family UPF0020 signature 
PROSITE cross-reference(s): PS01261; UPF0020 

The following uncharacterized proteins have been shown [1] to share regions of similarities: 

- Eschericha coli hypothetical protein ycbYand H.0116/15. the corresponding Haemophilus influenzae protein. 



EP 1 033 405 A2 



References 
[2071] 

[ 1] Hanks S.K.. Hunter T., FASEB J. 9:576-596(1995). 

I 2] Hunter T., Meth. Enzymol. 200:3-37(1991). 

( 3] Hanks S.K., Quinn A.M., Meth. Enzymol. 200:38-62(1991). 

[4] Hanks S.K., Curr. Opin. Struct Biol. 1:369-383(1991). 

[ 5] Hanks S.K., Quinn A.M., Hunter T„ Science 241:42-52(1988). 

1 6) Knighton D.R., Zheng J., Ten Eyck LR, Ashford V.A., Xuong N.-H., Taylor, S.S., Sowadski J.M., Science 253: 
407-414(1991). 

[ 7) Bairoch A., Ctaverie J.-M., Nature 331:22(1988). 

[ BJ Benner S., Nature 329:21-21(1987). 

[ 9] Kirby R., J. Mol. Evol. 30:489-492(1992). 

[10] Littler E., Stuart A.D., Chee M.S., Nature 358:160-162(1992). 

[11] Munoz-Dorado J., Inouye S., Inouye M., Cell 67:995-1006(1991). 

[2072] 835. (Asp_Glu_race) 

Aspartate and glutamate racemases signatures 

[2073] Cross-reference(s) PS00923; ASP_GLU_RACEMASE_1 PS00924; 
ASP_GLU_RACEMASE_2 

[2074] Aspartate racemase (EC 5.1.1.13) and glutamate racemase (EC 5. 1 . 1 .3) are two evolutionary related bacterial 

enzymes that do not seem to require a cof actor for their activity [1]. Glutamate racemase, which interconverts L-gluta- 

mate into D-glutamate, is required for the biosynthesis of peptidogiycan and some peptide-based antibiotics such as 

gramicidin S. In addition to characterized aspartate and glutamate racemases, this family also includes a hypothetical 

protein from Erwinia carotovora and one from Escherichia coli (ygeA). Two conserved cysteines are present in the 

sequence of these enzymes. They are expected to play a role in catalytic activity by acting as bases in proton abstraction 

from the substrate. Signature patterns were developed for both cysteines. 

[2075] Consensus pattern: [I VA]-[LI VM]-x-C-x(0 f 1 )-N-[ST]-{MSAHSTHHUVFYSTANK] 

Consensus pattern: [LIVM](2)-x-[AG]-C-T-[DEHHUVMFY]-[PNGRSJ-x-[LIVM] 

[2076] [ 1] Gailo K.A., Knowles J.R., Biochemistry 32:3981-3990(1993). 

[2077] 836. (ATP-sutfuryiase) 

ATP-sulfurylase 

[2078] This family consists of ATP-sulfurylase or sulfate adenylyltransferase EC:2.7.7.4 some of which are part of a 
Afunctional polypeptide chain associated with adenosyl phosphosulphate (APS) kinase APSJdnase. Both enzymes 
are required for PAPS (phosphoadenosine-phosphosulfate) synthesis from inorganic sulphate [2]. ATP sutfurylase 
catalyses the synthesis of adenosine-phosphosulfate APS from ATP and inorganic sulphate [1]. 

Number of members: 37 

[2079] 

[1] Kurima K f Warman MU Krishnan S, Domowicz M, Krueger RC Jr, Deyrup A, Schwartz NB; Medline: 98337975 
A member of a family of sulfate^activating enzymes causes murine brachymorphism* [published erratum appears 
in Proc Natl Acad Sci U S A 1998 Sep 29;95(20): 12071] Proc Natl Acad Sci U S A 1998;95:8681-8685. 
[2] Rosenthal E t Leustek T; Medline: 96096529 A multifunctional Urechis caupo protein. PAPS synthetase, has 
both ATP sutfurylase and APS kinase activities.' Gene 1995;165:243-248. 

[2080] 837. (ATP-synt_F) 
ATP synthase (F/14-kDa) subunit 

[2081] This family includes 14-kDa subunit from vATPases [1 ], which is in the peripheral catalytic part of the complex 
[2]. The family also includes archaebacterial ATP synthase subunit F [3]. 

Number of members: 23 

[2082] 

[1 J Guo Y, Kaiser K, Wieczorek H, Dow JA; Medline: 96269411 The Orosophila melanogaster gene vha!4 encoding 
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tfcxi, kinetic analysis, and domain mapping.' J Biol Chem 1996;271:2225-2233. 

[20S9] 833. (AOX) 
Alternative oxidase 

5 [2060] The alternative oxidase is used as a second terminal oxidase in the mitochondria, electrons are transtered 
directly from reduced ubiquinol to oxygen forming water [2]. This is not coupled to ATP synthesis and is not inhibited 
by cyan.de. this pathway is a single step process f 1]. In rice the transcript levels of the alternative oxidase are increased 
by low temperature (1 J. 

10 Number of members: 27 

[2061] 



[1] Ito Y. Saisho D. Nakazono M, Tsutsumi N, Hirai A; Medline: 98086211 Transcript levels of tandem-arranged 
alternative oxidase genes in rice are increased by low temperature." Gene 1 997;203:1 21 -1 29. 

[2] U Q, Rftzel RG, McLean LL, Mcintosh L, KoT, Bertrand H, Nargang FE; Medline: 9636641 3 Cloning and analysis 
of the altematrvo oxidase gene of Neurospora crassa." Genetics 1 996; 1 42: 1 29-1 40. 

20 [2062] 834. (APH) 

Protein kinases signatures and profile 

[2063] Cross-reference(s): PS00107; PROTEIN_KINASE_ATP, PS00108- 
PROTEIN_KINASE_ST, PS00109; PFtOTEIN_KINASE_TYR, PSSOOIf 
PROTEIN_KINASE_DOM 

[2064] Eukaryotic protein kinases[1 to 5] are enzymes that belong to a very extensive family of proteins which share 
a conserved catalytic core common to both serine/threonine and tyrosine protein kinases. There are a number of 
conseivedregions in the catalytic domain of protein kinases. Two of these regions have been selected to build signature 
patterns. The first region, which is located in the N-terminal extremity of the catalytic domain, is a glycine-richstretch 
ofi residues in the vicinity of a lysine residue, which has been shown to be involved in ATP binding The second reaton 
which is located in the central part of the catalytic domain, contains a conserved aspartic acid residue which is important 
or the catalytic activity of the enzyme [6]; two signature patterns were derived for that region: one specific for serine/ 
threonine kinases and the other for tyrosine kinases. A profile was developed which is based on the alignment in 111 
and covers the entire catalytic domain. «"<B"meni in uj 

[2065] Consensus pattern: [UV>G-{P}-G-{PHFYVVMGSTNH]-lSGA]-pW]-[LIVCAT]-{PDl-x- fGSTACUVMFYl-x 
(5.18HUVMFYWCSTARHAIVPHUVMFAGCKR]-K[K binds ATP] ieSTACUVMFY]-x 

P? 6 ? fi ff qUenC ! S kn ° Wn 10 belon9 to mis class detec,ed bv ^0 pattern the majority of known protein kinases but it 
miedb^Ws^ttern "P"*®* viral ***** are quite divergent in this region and are completely 

£2 2° nSenSUS P attem: CJVMFYq-x-IHYJ-x-D^VMFYJ-K-xta^N^VMFYCTlO) [D is an active site residue] 

ST 1 ^T° eS t0 bel0n9 t0 thto ClaSS de,eC,ed * P**™"- Mos « serine ' specific protein 

kinases with 10 except.ons (half of them viral kinases) and also Epstein-Barr virus BGLF4 and Drosophila ninaC which 
have respectively Ser and Arg instead of the conserved Lys and which are therefore detected byttTe ^ 
specific pattern described below. « lyiosme wnase 

2S, COnSenSUS r t***™ l UVM ^Chx-[HY]-x.D-(UVMFY H RSTAC]-x(2)-N-[UVMFYC](3) [D is an active site res- 
idue tyrosine specific protein kinases with the exception of human ERBB3 and mouse blk. This pattern will also detect 
most bactenal aminoglycoside phosphotransferases [8.9] and herpesviruses ganciclovir kinases [101; which are pro- 
teins structurally and evolutionary related to protein kinases. Sequences known to belong to this class detected by the 
^ ?n?tb 6XCept f ° r thfee Viral kinaS6S - ™* profile also detec,s ^eptor guanylate cyclases (see <PDOC00430>) 
and 2-5A-dependentribonucleases. Sequence similarities between these two families and the eukaiyotic protein kinase 
tamrty have been noticed before. It also detects Arabidopsis thaliana kinase- like protein TMKL1 which seems to have 
lost its catalytic activity. 

^nS G m eU ^**y e P rotein have ^ **" tound in prokaryotes such as Myxoccccus 

xanthus [ 1 ] and Yersinia pseudotuberculosis. Note the patterns shown above has been updated since their publication 
in [7J. Note this documentation entry is linked to both signature patterns and a profile. As the profile is much more 
sensitive than the patterns, you should use it if you have access to the necessary software tools to do so 
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[2094] Consensus pattern: N-x-D-G-S-x(4)-C-G-N-{GA]-x-R [C is an active site residue] Sequences known to belong 
to this class detected by the pattern ALU except for an Anabaena dapF which has a Ser instead of the active site Cys. 
[2095] [ 1] Cirilli M., Zheng R, Scapin G., Blanchard J.S.. Biochemistry 37:16452-16458(1998). 
[2096] 842. (ON A _gyraseB_C) 
DNA topoisomerase II signature 

[2097] Cross-reference(s) PS001 77; TOPOISOMERASEJI 

DNA topoisomerase I (EC 5.99.1.2) [1,2,3,4,E1] is one of the two types of enzyme that catalyze the interconversion 
of topological DNA isomers. Type It topoisomerases are ATP-dependent and act by passing a DNA segment through 
a transient double-strand break. Topoisomerase II is found in phages, archaebacteria, prokaryotes, eukaryotes, and 
in African Swine Fever virus (ASF). In bacteriophage T4 topoisomerase I! consists of three subunits (the product of 
genes 39, 52 and 60). In prokaryotes and in archaebacteria the enzyme, known as DNA gyrase, consists of two subunits 
(genes gyrA and gyrB [E2]). In some bacteria, a second type II topoisomerase has been identified; it is known as 
topoisomerase IV and is required for chromosome segregation, it also consists of two subunits (genes parC and parE). 
In eukaryotes, type II topoisomerase is a homodimer. 

[2098] There are many regions of sequence homology between the different subtypes of topoisomerase II. The 
relation between the different subunits is shown in the following representation: 



< About- 1400-residues > 

[ Protein 39-* ][ — Protein 52 — ] Phage T4 

[ gyrB * ][ gyrA ] Prokaryote II 

Archaebacteria 

[ parE * ][ parD ] Prokaryote IV 

[ * ] Eukaryote and 

ASF 

'*': Position of the pattern. 



[2099] As a signature pattern for this family of proteins, a region that contains a highly conserved pentapoptide was 
selected. The pattern is located in gyrB, in parE, and in protein 39 of phage T4 topoisomerase. 
[2100] Consensus pattern: [UVMA]-x-E-G-[DN]-S-A-x-[STAG] 

[ 1] Sternglanz R., Curr. Opin. Cell Biol. 1:533-535(1990). 

[ 2] Bjornsti M.-A., Curr. Opin. Struct Biol. 1:99-103(1991). 

[ 3] Sharma A. # Mondragon A., Curr. Opin. Struct. Biol. 5:39-47(1995). 

[ 4] Roca J., Trends Biochem. Sci. 20:156-160(1995). 

[2101] 843. (DUF16) 
Protein of unknown function 

[2102] The function of this protein is unknown. It appears to only occur in Mycoplasma pneumoniae. 



Number of members: 26 



[2103] [1] Himmelreich R, Hilbert H, Plagens H, Pirkl E, Li BC. Herrmann R; Medline: 97105885 Complete sequence 
analysis of the genome of the bacterium Mycoplasma pneumoniae.* Nucleic Acids Res 1996;24:4420-4449. 
[2104] 844. (DUF21) 
[2105] Domain of unknown function 

[2106] This transmembrane region has no known function. Many of the sequences in this family are annotated as 
hemolysins, however this is due to a similarity to Swiss:Q54318 that does not contain this domain. This domain is 
tound in the N-terminus of the proteins adjacent to two intracellular CBS domains CBS. 
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a 14-kDa F-subunit of the vacuolar ATPase." Gene 1996; 172*239-243 

-iri" 9 SB .'? rWer BP ' TS3i ^ 306 XS ' S,0ne DK Med,in * 9621 6416 'dentincationofa14-kDasubunitassociated 
with the catalytic sector of clathrin-coated vesicle H+-ATPase." J Biol Chem 1 996 271 3324-3327 

[3] Wilms R, Freiberg C. Wegerle E. Meier I. Mayer F. Muller V; Medline: 96324968 Subunit structure and orqan- 
SJ^bS 96 " 68 °' A1A ° ATPaSS ^ Ar0haeOn Methanosarcina mazei GoL" J Biol Chem 1996 27V 



[2083] 838. (CBD_4) 
Starch binding domain 

Number of members: 48 

[2084] 839. (CbDQ 



E 1?* fe UnCertain - hoWever * * ,ound m ^"amin biosynthesis operons and so may have a 

Sfhl * ZJ™ Cb , ,X P f0teins ^ a striki "9 histidine^ch regon at their C-terminus. which sugg Jls *a ! 
miaht be involved m matai rh a o* n i yy u,ai 11 



might be involved in metal chelation [1] 
Number of members: 6 



[2086] [1] Raux E, Unois A, Warren MJ. Rambach A, Thermes C; Medline: 98416126 Cobalamin (vitamin B12) 
b,osynthes,s: KJentrficat.on and characterization of a Bacillus medium cobl operon.' Biochem J 1 »SSS52 

840. (Complex1_51K) 

51 Kd subunft si9na,ures cros ~ e ,erence(s > p ^ 

S«iSr tofyK !lr NADH deh y dr °9 enase < EC 1-6-5.3) [1,21 (also known as complex I or NAOH-ubiqu inone 

to exist in the chloroplast and m cyanobacteria (as a NADH-plastoquinone oxidoreductase) Among the 25 to 30 
f^rpept^ 

[2089] The 51 Kd subunit is highly similar to [3,4]: 

" LTa1 F ?2 h |cl A S ,i9eneS eUt,DPhUS NAD " redUCin9 h y*^nase (gene hoxF) which also binds to NAD, FMN, 

- Subunit NQ01 of Panacoccus denitrificans NADH-ubiquinone oxidoreductase 

- Subunit F of Escherichia coli NADH-ubiqumone oxidoreductase (gene nuoF). 

nTL^L™ "^Ft*" ***** hydrogenase alpha subunit contains three regions of sequence similar- 
ly ™ P T b * to ,he "AD-binding site, the second to the FMN-birWg site. anXe 

t^T^TiZ Z 7 T^' 10 the kaMa bindin9 re9ion - P 3 " 6 ™ have been develop^ 

lor the FMN-bmding and for the 2Fe-2S binding regions. ^ 

P091] Consensus pattern: G-[AM]-G-{AR]-Y-[LIVM]-C-G-{DE](2)-ISTA1(2HLIMl(2)-fENV S 
Consensus pattern: E-S-C-G-x-C-x-P-C-R-x-G [The three C's are putative 2Fe-2S ligands] 

[ 1J Ragan C.I., Curr. Top. Bioenerg. 15:1-36(1987). 

1 2] Weiss H., Friedrich T. Hofhaus G.. Preis D.. Eur. J. Biochem. 197:563-576(1991) 

[ 3J Feamley I.M.. Walker J.E. Biochim. Biophys. Acta 1140-105-134(1992) 

[ 4] Weidner U., Geier S.. Ptock A, Friedrich T.. Leif H.. Weiss H.. J. Mol. Biol. 233:109-122(1993). 

[2092] 841. (DAP_epimerase) 

Diaminopimelate epimerase signature 

[2093] Cross-reference(s) PS01326; DAP_EPIMERASE 

Diaminopimelate epimerase (EC 5.1.1.7) catalyzes the isomeriazation of L.L- to D.L-mescMfiaminopimelate in the 

seen, [1J to function as the acid and base in the catalytic mechanism. As a signature pattern the iBotonsuS^ 
the first of these two active site cysteines were selected. 9 SUfround,n 9 
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diffusion channels that allows any solutes up to a certain size (that size is known as the exclusion limit) to cross the 
membrane, while other porins are specific for a solute and contain a binding site for that solute inside the pores (these 
are known as selective porins). As porins are the major outer membrane proteins, they also serve as receptor sites 
for the binding of phages and bacteriocms. General diffusion porins generally assemble as trimer in the membrane 
£ and the transmembrane core of these proteins is composed exclusively of beta strands [2]. It has been shown (3) that 
a number of general porins are evolutionary related, these porins are: 

Enterobacteria phoE. 
Enterobacteria ompC. 
10 - Enterobacteria ompF. 

Enterobacteria nmpC. 
- Bacteriophage PA-2 LC. 
Neisseria R.A. 
Neisseria Pl.B. 

15 

[2125] As a signature pattern a conserved region was selected, located in the C-terminal part of these proteins, 
which spans two putative transmembrane beta strands. 

[2126] Consensus pattern: [UVMFY]-x(2)^-x(2)-Y-x-F-x-K-x(2)-[SN]-[STAV]-[LIVMFYWl-V 

20 [1] Benz R. ( Bauer K., Eur. J. Biochem. 176:1-19(1988). 

[2] Jap B.K., Walian P.J., Q. Rev. Biophys. 23:367-403(1990), 

[3] Jeanteur D., Lakey J.H., Pattus F., Mol. Microbiol. 5:2153-2164(1991). 

[2127] 851. (HlyD) 
25 HlyD family secretion proteins signature 

[21 28] Cross-ref erence(s) PS00543; HLYDFAMILY 

Gram-negative bacteria produce a number of proteins which are secreted into the growth medium by a mechanism 
that does not require a cleaved N-terminal signal sequence. These proteins, while having different functions, require 
the help of two or more proteins for their secretion across the cell envelope. Amongst which a protein belonging to the 
30 ABC transporters family (see the relevant entry <PDOC00185>) and a protein belonging to a family which is currently 
composed (1 to 5] of the following members: 



Gene 


Species 


Protein which is exported 


hlyD 


Escherichia coli 


Hemolysin 


appD 


A.pleuropneumoniae 


Hemolysin 


IcnD 


Lactoccccus lactis 


Lactccoccin A 


IktD 


A.actinomycetemcomitans 


Pasteurella haemolytica Leukotoxtn 


rtxD 


A. pleuropneumoniae 


Toxin-Ill 


cyaD 


Bordetelta pertussis 


Calmoduiin-sensitive adenylate cyclase-hemolysin(cyclolysin 


cvaA 


Escherichia coli 


Colicin V 


prtE 


Erwinia chrysanthemi 


Extracellular proteases B and C 


aprE 


Pseudomonas aeruginosa 


Alkaline protease 


emrA 


Escherichia coli 


Drugs and toxins 


yjcR 


Escherichia coti 


Unknown 



These proteins are evolutionary related and consist of from 390 to 480 amino acid residues. They seem to be anchored 
in the inner membrane by a N-terminal transmembrane region. Their exact role in the secretion process is not yet 
so known. The C-terminal section of these proteins is the best conserved region; a signature pattern from that region was 
derived. 

[21 29] Consensus pattern: [LIVM]-x(2)-G4LM]-x(3)-[STGAV]-x4LIVM^ 
(LIVMFYW](3) 

Sequences known to belong to this class detected by the pattern ALL, except for emrA and yjcR. 

55 



EP 1 033 405 A2 



Number of members: 42 



[2107] 845. (DUF56) 
[21 08] Integral membrane protein 

[2109] The members of this family are putative integral membrane proteins. The function of the family is unknown 
howeverthefamilyirK:^ 
activity resides in this region or its N terminal region. 

Number of members: 13 

[2110] 846. (DUF94) 
[21 1 1] Domain of unknown function 

[2112] The function of this domain is unknown. It is found in both eukaryotes and archaebacteria The alignment 

ftree conserved cysteines and a histidine that might be metal binding, however these are absent in the archaebacterial 
p rote ins . 

Number of members: 9 

20 [2113] 847. (FF) 
[2114] FF domain 

[2115] This domain may be involved in protein-protein interaction [1 ]. 
Number of members: 42 



10 



is 
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J^J-t 6 "?^ M J' L ^ 6r P; MedHne: 99322199 11,6 FF domain: 3 novel ™* »* accompanies WW 
domains. - Trends Biochem Sci 1999;24:264-265 »»*» 

[2117] 848. (FLO_LFY) 

Floricaula / Leafy protein 

!flj! ] ™ s,a ^™ sist ^ 

Src^eCenr ""^ h <* *— prote ' ns *« 

Number of members: 16 

[2119] 

[1] Holer J. Turner U Hellens R, Ambrose M, Matthews R Michael A, Ellis N; Medline: 97411151 UNIFOUATA 
regulates leaf and flower morphogenesis in pea" Curr Biol 1997;7:581-587 

[2] Weigel D, Alvarez J, Smyth DR, Yanofsky MF. Meyerowitz EM; Medline: 92274452 LEAFY controls floral mer- 
istem identrty in Arabidopsis.- Cell 1 992;69:843-859. 

[2120] 849. (G-patch) 
G-patch domain 

SOL 21 S d0m ^ n iS ,OUnd 3 nUmbef °' mA bindi " 9 proteins - and " also found «n Proteins that contain RNA 

SSSJSST" su " ests that *" "- " mA b " ding ,unction - ™ s ' has ««s 

Number of members: 47 

[2122] (1 ]AravindL.KooninEV; Medline: 10470032 G-patch: a newcoiservedcVynainineukaryoticRNA-processina 

protems and type D retroviral polyproteins." Trends Biochem Sci 1999 24- 342-344 c ^-processing 

[2123] 850. (Gram-ve_porins) 

General diffusion Gram-negative porins signature 

[2124] Cross-reference(s) PS00576; GRAM_NEG_PORIN 

The outer membrane of Gram-negative bacteria acts as a molecular filter for hydrophilic compounds Proteins known 

ST™!! 2£T 6 ,m ° leCU,ar Si6Ve " P r °P erties - •» outer membrane. PoTform torge wa e^ 

channels wh«h altows the diffusion of hydrophflic molecules into the periptesmic space. Some porins Z geneS 
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Major outer membrane lipoprotein (murein-Iipoproteins) (gene Ipp). 
Escherichia colt lipoprotein-28 (gene nip A). 
Escherichia colt lipoprotein-34 (gene nlpB). 
Escherichia coli lipoprotein nlpC. 
Escherichia coli lipoprotein nlpD. 

Escherichia coli osmotically inducible lipoprotein B (gene osmB). 
Escherichia coli osmotically inducible lipoprotein E (gene osmE). 
Escherichia coli peptidogty can-associated lipoprotein (gene pal). 
Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 
Escherichia coli copper homeostasis protein cutF (or nlpE). 
Escherichia coli plasm ids traT proteins. 
Escherichia coli Col plasmids lysis proteins. 
A number of Bacillus beta -lactamases. 

Bacillus subtilis periplasms oligopeptide-binding protein (gene oppA). 
Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). 
Borrelia hermsit variable major protein 21 (gene vmp21 ) and 7 (gene vmp7). 
Chlamydia trachomatis outer membrane protein 3 (gene omp3). 
Fibrobacter succinogenes endoglucanase cel-3. 
Haemophilus influenzae proteins Pal and Pep. 
Klebsiella pullulunase (gene pulA). 
Klebsiella pullulunase secretion protein puis. 
Mycoplasma hyorhinis protein p37. 

Mycoplasma hyominis variant surface antigens A, B, and C (genes vlpABC). 
Neisseria outer membrane protein H.8. 
Pseudomonas aeruginosa lipopeptide (gene IppL). 
Pseudomonas solanacearum endoglucanase egl. 

Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 
Rickettsia 17 Kd antigen. 

Shigella fiexneri invasion plasmid proteins mxiJ and mxiM. 

Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 

Treponema pallidium 34 Kd antigen. 

Treponema pallidium membrane protein A (gene tmpA). 

Vibrio harveyi chitobiase (gene chb). 

Yersinia virulence plasmid protein yscJ. 

Halocyanin from Natrobacterium pharaonts [4], a membrane associated copper-binding protein. This is the first 
archaebacterial protein known to be modified in such a fashion). 

[2140] From the precursor sequences of all these proteins, a consensus pattern and a set of rules to identify this 
type of post-translational modification were derived. 

[2141] Consensus pattern: {DERK}(6)-(UVMFWSTAG](2HUVMFYSTAGCQ]-{AGS]-C [C is the lipid attachment 
site] Additional rules: 1 ) 

[2142] The cysteine must be between positions 15 and 35 of the sequence in consideration. 2) There must be at 
least one Lys or one Arg in the first seven positions of the sequence. Sequences known to belong to this class detected 
by the pattern ALL Other sequence(s) detected in SWISS-PROT some 100 prokaryotic proteins. Some of them are 
not membrane lipoproteins, but at least half of them could be. 

References 

[2143] 

[1] Hayashi S. ( Wu H.C., J. Bioenerg. Biomembr. 22:451-471(1990). 
[2] Klein P, Somorjai R.L, Lau P.C.K., Protein Eng. 2:15-20(1988). 
[3] von Heijne G., Protein Eng. 2:531-534(1989). 

[4J Mattar S., Scharf B., Kent S.B.H., Rodewald K., Oesterhelt D., Engelhard M. J. Biol. Chem. 269:14939-14945 
(1994). 



[2 1 44] 856. (Lipoprotein_7) 
Adhesin lipoprotein 
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[2130] 
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[1] Gilson L. Mahanty H.K.. Kotter R_, EMBO J. 9:3875-3884(1 990). 

12] Letoffe S., Delepelaire P., Wandersman C, EMBO J 9* 1375-1382(1 990) 

[4] Duong R, Lazdunski A., Cami B., Murgier M„ Gene 121:47-54(1992). 
[5] Lewis K., Trends Biochem. Sci. 19:119-123(1994). 

[2131] 852. (I BR) 
In Between Ring fingers 

[2132] The |BR (In Between Ring fingers) domain is found to occur between pairs of ring fingers izf-C3HC4l Th« 

Number of members: 25 
[2133] 

mi JJorett E, Bork P; Medline: 10366851 A novel transaction domain in parkin/Trends Biochem Sci 1999;24: 

rlTJ*' ? eijden l K E * elinck - Vere <*"eren CA. Lowenberg B, Jansen JH; Medline: 99349709 TRIADS" a new 
class of prote.nswrth a novel cysteine-rich signature." Protein Sci 1999;8:1557-1561. 

[2134] 853. (IPPT) 
IPP transferase 



[1] Durand JM, Bjork GR. Kuwae A. Yoshikawa M. Sasakawa C; Medline: 97440126 The modified n^i^wo 

M 'l HUnterlA She " WC ' &liman EC> **** NC " H °PPer AK; Medline: 94187700 Subcellular locations 

SiSSr?"^ T° 01 SeqUenCeS SUffiCfent ,0r ter9e,in 9 to '"ttochondria and demonstration thafS 
chondral and nuclear isoforms commingle in the cytosol." Mol Cell Biol 1994-14-22984306 

Ni^ an , EC, 1 S l USherL8, M8rtln NC> Medline: 91203856 MODS translation initiation sites determine 

Nensopentenyladenosine modification of mitochondrial and cytoplasmic tRNA." Mol Cel. Biol 1 S ^SS^S. 

[2135] 854. (KE2) 
KE2 family protein 

Kg ,e^e ^r £SS~ " ^ ^ " ""^ ^ ^ ««— ' * «** 3 ° NA 

Number of members: 9 
[2137] 

[1] | Ha H. Abe K Artzt K; Medline: 92084131 Primary structure of the embryc^xpressed gene KE2 from the mouse 
H-2K region.* Gene 1991 '107 345-346 mouse 

[2138] 855. (Lipoprotein_6) 

Prokaryotic membrane lipoprotein lipid attachment site 

[21 39] Cross-reference(s) PS0001 3; PROKAR_LI POPROTEi N 

In promotes, membrane lipoproteins are synthesized with a precursor signal peptide which is cleaved bv a snecific 
npoprotem signal peptidase (signal peptidase II). The peptidase recognizesa clerved^ " s t^ 

o a cysterne rescue to which a glyceride-fatty acd lipid is attached [1]. Some of the protel ^ 
processing currently include (for recent listings see [1 ,2,3]): undergo such 
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Number of members: 23 
[2157] 

[1] Wu P, Brockenbrough JS, Metcalfe AC, Chen S, Aris JP; Medline: 98298165 Nop5p is a small nucleolar ribo- 

nucleoprotein component required for pre- 18 S rRNA processing in yeast." J Biol Chem 1998;273:16453-16463. 

(2] Gautier T t Berges T, Tollervey D, Hurt E;Medline: 8038777 Nucleolar KKE/D repeat proteins Nop56p and Nop58p 

interact with Nopip and are required for ribosome biogenesis.' Mol Cell Biol 1997;17:7088-7098. 

(3] Weidenhammer EM, Singh M, Ruiz -Noriega M, Woolford JL Jr; Medline: 96184869 The PRP31 gene encodes 

a novel protein required for pre-mRNA splicing in Saccharomyces cerevisiae.* Nucleic Acids Res 1996;24: 

1164-1170. 

[2158] 861.(Nramp) 

Natural resistance-associated macrophage protein 

The natural resistance-associated macrophage protein (NRAMP) family consists of Nrampl, Nramp2, and yeast pro- 
teins Smf1 and Smf2. The NRAMP family is a novel family of functional related proteins defined by a conserved hy- 
drophobic core of ten transmembrane domains [5]. This family of membrane proteins are divalent cation transporters. 
Nrampl is an integral membrane protein expressed exclusively in cells of the immune system and is recruited to the 
membrane of a phagosome upon phagocytosis [1 J. By controlling divalent cation concentrations Nrampl may regulate 
the interphagosomal replication of bacteria [1]. Mutations in Nrampl may genetically predispose an individual to sus- 
ceptibility to diseases including leprosy and tuberculosis conversely this might however provide protection form rheu- 
matoid arthritis [1]. Nramp2 is a multiple divalent cation transporter for Fe2+, Mn2+ and 2n2+ amongst others it is 
expressed at high levels in the intestine; and is major transferrinnndependent iron uptake system in mammals [1]. The 
yeast proteins Smf 1 and Smf2 may also transport divalent cations [3]. 

Number of members: 36 

[2159] 

[1] Govoni G, Gros P; Medline: 98383996 Macrophage NRAMP1 and its role in resistance to microbial infections. 
• Inflamm Res 1 998;47: 277-284. 

[2] Agranoff DD, Krishna S Medline: 98294035 Metal ion homeostasis and intracellular parasitism." Mol Microbiol 
1998;28:403-412. 

[3] Pinner E. Gruenheid S, Raymond M t Gros P; Medline: 98030569 Functional complementation of the yeast 
divalent cation transporter family SMF by NRAMP2, a member of the mammalian natural resistance- associated 
macrophage protein family.* J Biol Chem 1997;272:28933-28938. 

[4] Cellier M, Belouchi A, Gros P; Medline: 96402487 Resistance to intracellular infections: comparative genomic 
analysis of Nramp." Trends Genet 1996;12:201-204. 

[5] Cellier M, Prive G, Belouchi A, Kwan T, Rodrigues V, Chia W t Gros P; Medline: 96036029 Nramp defines a 
family of membrane proteins.* Proc Natl Acad Sci U S A 1995;92:10089-10093. 

[2160] 862. (NTPjransf_2) 
Nucleotidyltransferase domain 

Members of this family belong to a large family of nucleotidyltransferases [1]. 
Number of members: 83 

[2161] (1 ] Holm U Sander C; Medline: 96005605 DNA polymerase beta belongs to an ancient nucleotidyltransferase 
superfamily." Trends Biochem Sci 1995;20:345-347. 
[2162] 863. (Paramyxo_P) 
Paramyxovirus P phosphoprotein 

[2163] This family consists of paramyxovirus P phosphoprotein from sendai virus and human and bovine parainflu- 
enza viruses. The P protein is an essential part of the viral RNA polymerase complex formed form the P and L proteins 
[1 ]. The exact role of the P protein in this complex in unknown but it is involved in multiple protein-protein interactions 
and binding the polymerase complex to the nucleocapsid or ribonucleoprotein template [1]. It also appears to be im- 
portant for the proper lolding of the L protein [1]. The paramyxoviruses have a negative sense ssRNA genome [1]. 
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» '! * ° 0nSiS,S ° f P5 ° and variable adherence-associated antigen (\fea) adhesins from Mycoolasma 
homjnis. M. homin.s is a mycoplasma associated with human urogenital diseases, pneumonia, and septic arthritis f 11 
An ^dhe«n „ a cell surface molecule that mediates adhesion to other c^lls or to the surrounding surface or suoSa e 
•me Vaa antigen is a 50-kDa surface lipoprotein that has four tandem repetitive DNA sequenced encoding a perkSc' 

52? TF"?Sf " "** h,mun °9 enic in *• human host [1J. p50 is aiso a 50*Da li popZf, havingTee 
repeats A.B and C. that may be a tetramer of 191-kDa in its native environment [2] 9 



Number of members: 18 
[2146] 



adhesm encoded by divergent vaa genes. Infect Immun 1 996-64-2737-2744 

S^T* Bl K KitZerow * Few ™>™ HC Schaal H. Hadding U; Medline: 97047675 Repetitive elements of the 

jest "*•* 1350 an 1,0 differembted * — — — • lirssx 



[2147] 857. (MaoCJike) 
MaoC like domain 

ritS.^i™^^ P rotei " «f f °« nd «o similarity with 3 wide variety of enzymes; estradiol 1 7 beta-dehydrogenase 
doma^s Th.s domain is also present in the NodN nodulation protein N. No specific function has SasstanS 

Number of members: 46 

EI? i, m SU9in ° H " SaS3ki M> H " Yamas hita M. Murooka Y Medline: 96235221 A monoamine-reaulated 

[2150] 858. (MSP) 

Manganese-stabilizing protein / photosystem II polypeptide 

[2151] This family consists of the 33 KDa photosystem II polypeptide from the oxygen evolving comolex fOECl of 
plants and cyanobacteria The protein is also known as the 

manganese complex of the OEC and may provide the ligands for the complex [1] assoceted wrth the 

Number of members: 17 

EU «L 111 S! briC l JB ; ZiKnSkaS B * M0d,ine: 88334494 " Clonin 9- nucleo,ide sequence and mutational analysis of 
ISSSZSS. ^ " of Synlhocystis aSPSJSK 

[2153] 859. (NAC) 

S 1 MakafOVa KS ' Aravind Galoerin MY, Grishin NV. Tatusov Rl_ Wolf Yl. Koonin EV Medline" 99342100 

Number of members: 27 

[2155] 860. (Nop) 

Putative snoRNA binding domain 

SSJIT <ami t * V3riOUS Pr ° RNA P focessi "9 ribonucleoproteins. The function of the aligned -ecion - 

fro^v^r ^ ^ 3 COam ° n RNA W Sn ° RNA ° f bindi "9 "opSp (Ncp58p) Sw^ Q12^99 

from yeast is the protem component of a ribonucleoprotein protein required for ore-lfls rRwi ^1 
gested .option with Noplp in a snoRNA compel,. Nc^p sZ^^sX^S Z^Z 
are requ,red for nbosome biogenesis { 2]. Prp3, p Swiss:p497<M is required for P^^i^^SSL'S. 
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and delta 1- pyrroline-5-carboxylate dehydrogenase domains ol the multifunctional Escherichia coli PutA protein." J 
Mol Biol 1994;243:950-956. 
[21751 868. (PsbP) 

[2176] This family consists of the 23 kDa subunit of oxygen evolving system of photosystem It or PsbP from various 
5 plants {where it is encoded by the nuclear genome) and Cyanobacteria. The 23 KDa PsbP protein is required for PSII 
to be fully operational in vivo, it increases the affinity of the water oxidation site for CI- and provides the conditions 
required for high affinity binding of Ca2+ [2]. 

Number of members: 25 

10 

[2177] 

[1] Rova EM, Mc Ewen B, Fredriksson PO, Styring S; Medline: 97067138 Photoactivation and photoinhibition are 
competing in a mutant of Chlamydomonas reinhardtii lacking the 23-kDa extrinsic subunit of photosystem II." J 
is Biol Chem 1 996;271 :2891 8-28924. 

[2] Kochhar A, Khurana JP, Tyagi AK; Medline: 97191538 Nucleotide sequence of the psbP gene encoding pre- 
cursor of 23-kDa polypeptide of oxygen-evolving complex in Arabidopsis thaliana and its expression in the wild- 
type and a constitutively photomorphogenic mutant." DNA Res 1996;3:277-285. 

20 [2178] 869. (PUA) 

[2179] The PUA domain named after PseudoUridine synthase and Archaeosine transglycosylase, was detected in 
archaeal and eukaryotic pseudouridine synthases, archaeal archaeosine synthases, a family of predicted ATPases 
that may be involved in RNA modification, a family of predicted archaeal and bacterial rRNA methylases. Additionally, 
the PUA domain was detected in a family of eukaryotic proteins that also contain a domain homologous to the translation 
25 initiation factor elFVSUM ; these proteins may comprise a novel type of translation factors. Unexpectedly, the PUA 
domain was detected also in bacterial and yeast glutamate kinases; this is compatible with the demonstrated role of 
these enzymes in the regulation of the expression of other genes [1]. It is predicted that the PUA domain is an RNA 
binding domain. 

30 Number of members: 48 

[2180] [1 J Aravind L, Koonin EV; Medline: 991 931 78 Novel predicted RNA-binding domains associated with the trans- 
lation machinery." J Mol Evol 1999;48:291-302. 
[2181] 870. (RF1) 
35 eRF1 -like proteins 

[2182] Members of this family are peptide chain release factors. The eukaryotic Release Factor 1 proteins (eRF1 s) 
are involved in termination of translation. The eRF1 protein is functional for all stop codons and appears to abolish 
read-through of these codons. This family also includes other proteins for which the precise molecular function is 
unknown. Many of them are from Archaebacteria. These proteins may also be involved in translation termination but 
40 this awaits experimental verification. 

Number of members: 25 

[2183] 

. 45 

[1] Frolova l_ Le Goff X, Rasmussen HH, Cheperegin S, Drugeon G, Kress M, Arman I, Haenni AL, Celis JE, 
Philippe M, etal; Medline: 95082951 A highly conserved eukaryotic protein family possessing properties of polypep- 
tide chain release factor" [see comments] Nature 1994;372:701-703. 

[2] Drugeon G, Jean-Jean O. Frolova L, Le Goff X, Philippe M, Kisselev L. Haenni Al_* Medline: 9731 531 4 Eukary- 
50 otic release factor 1 (eRFI ) abolishes readthrough and competes with suppressor tRNAs at all three termination 

codons in messenger RNA." Nucleic Acids Res 1997;25:2254-2258. 

[2184] 871. (Ribosomal_L14e) 
Rfoosomal protein L14 
55 [2185] This family includes the eukaryotic ribosomal protein L14. 



EP 1 033 405 A2 

Number of members: 15 
[2164] 

5 [1] Bowman MC, Smallwood S, Moyer SA; Medline: 99329169 Dissection of Individual Functions of the Sendai 

Virus Phosphoprotein in Transcription.• J Virol 1999;73:6474-6483. 

[2] Matsuoka Y. Curran J. Pelet T. Kolakofsky D, Ray R, Compans RW; Medline: 91 237868 The P gene of human 
parainfluenza virus type 1 encodes P and C proteins but not a cysteine-rich V protein." J Virol 1 991 ;65:3406-3410. 

io [2165] 864. (Patatin) 

S!! 6 ! .? iS ,tr ily C ° nSiS,S ° f Vari ° US ^ m glycoproteins from plants. The patatin protein accounts for up to 40% 
of the tota soluble protein in potato tubers [2]. Patatin is a storage protein but it also has the enzymatic activity of lipid 
acyl hydrolase, catalysing the cleavage of fatty acids from membrane lipids [2]. 

« Number of members: 21 

[2167] 

[1] Banfahri Z, Kostyal Z. Barta E; Medline: 95107249 Solanum brevidens possesses a non^ucrose-inducible 
& patatin gene.' Mol Gen Genet 1994;245:517-522. sucrose inaucibie 

[2] Mignery GA, Pikaard CS, Park WD; Medline: 88226014 Molecular characterization of the patatin multiqene 
family of potato." Gene 1 988;62:27-44. ^ muiugene 

[2168] 865. (Pentapeptide_2) 
Pentapeptide repeats (8 copies) 

[2169] These repeals are found in many mycobacterial proteins. These repeats are most common in the PPE familv 
of prote.ns where they are found in the MPTR subfamily of PPE proteins. The function of 1hese repeats is unkno J 

Sn.^ h?Ti f e |> aPPr ° Ximately deSCribed as ™ XGX - ^Xcan be any amino acid. These repeats are simitar to 
Pentapeptide [1 ]. however it is not clear if these two families are structurally related. 

Number of members: 362 

[2170] 

[1] Bateman A, Murzin A, Teichmann SA; Medline: 98318059 Structure and distribution of pentapeptde repeats 
in bacteria" Protein Sci 1998;7:1477-1480. <*f«H»ue repeats 

SiS V^S^JV*!?" i Gamief T ChU ' Cher C * HamS D> GonSon SV Ei 9 ,meier K - Gas S. Barry CE 3rd. 
S „ e ^ " Br0W " D " ^'"'"S* 0 ^ T - R. Davies R. Devlin K. Feltwell T. Gentles S 

Hamlui N Holroyd S. Homsby T. Jagels K. Barrell BG; Medline: 98295987 Deciphering the biology of Mycobac- 
terium tuberculosis from the complete genome sequence." Nature 1998;393:537-544. 

[2171] 866. (Peptidase_C13) 
Peptidase C1 3 family 

Thisfamily of peptidases is known as the hemoglobinase family because it contains a globin degrading enzyme from 

M^T^h" fT 42665 are « ound " P- ,s — «*• organisms that have AJZSST 

Members of this family are asparaginyl peptidases [1 ]. 

Number of members: 26 

™? 0 rJ^! andO PM '. Rawlin 9 s ND - Brown MA. Young NE. Stevens RA. Hewitt E. Watts C. Barrett AJ- 
SZ^JSSSSSr" ^ CMZalk " °' « endopeptidase • 

[2173J 867. (Pro.dh) 
Proline dehydrogenase 

Number of members: 25 

[2174] [1] Ling M. Allen SW. Wood JM; Medline: 95055736 Sequence analysis identifies the proline dehydrogenase 
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The complete genome sequence of the hyperthermophilic, sulphate- reducing archaeon Archaeogtobus fulgidus. 
" Nature 1 997;390:364-370. 

[2195] 876. (Transglut_core) 

[2196] Cross-reference(s) PS00547; TRANSGLUTAMINASES 

Transglutaminases (EC 2.3.2. 1 3) (TGase) [1 ,2] are calcium-dependent enzymes that catalyze the cross-linking of pro- 
teins by promoting the formation of isopeptide bonds between the gamma-carboxyl group of a glutamine in one polypep- 
tide chain and the epsilon-amino group of a lysine in a second polypeptide chain. TGases also catalyze the conjugation 
of potyamines to proteins. The best known transglutaminase is blood coagulation factor XIII, a plasma tetrameric protein 
composed of two catalytic A subunits and two non-catalytic B subuntts. Factor XIII is responsible for cross-linking fibrin 
chains, thus stabilizing the fibrin clot. Other forms of transglutaminases are widely distributed in various organs, tissues 
and body fluids. Sequence data is available for the following forms of TGase: 

Transglutaminase K (Tgase K). a membrane-bound enzyme found in mammalian epidermis and important for the 
formation of the comified cell envelope (gene TGM1). 

Tissue transglutaminase (TGase C), a monomelic ubiquitous enzyme located in the cytoplasm (gene TGM2). 
Transglutaminase 3, responsible for the later stages of cell envelope formation in the epidermis and the hair follicle 
(gene TGM3). 

Transglutaminase 4 (gene TGM4). 

[2197] A conserved cysteine is known to be involved in the catalytic mechanism of TGases. The erythrocyte mem- 
brane band 4.2 protein, which probably plays an important role in regulating the shape of erythrocytes and their me- 
chanical properties, is evolutionary related to TGases. However the active site cysteine is substituted by an alanine 
and the 4.2 protein does not show TGase activity. 

[2198] Consensus pattem:[GT]<>[CA]-W-V-x-[SAH^ [The first C is the 

active site residue] Sequences known to belong to this class detected by the pattemALL Other sequences) detected 
in SWISS-PROTNONE. 

[2199] [ 1]lchinoseA.,BottenusRE., Davie E.W.J. Biol. Chem. 265: 13411 -1341 4(1 990). [2] GreenbergC.S.. Birck- 
bichler P.J., Rice R.H. FASEB J. 5:3071-3077(1991). 

[2200] 877. (TruBJM)TruB family pseudouridyiate synthase (N terminal domain) 

Members of this family are involved in modifying bases in RNA molecules. They carry out the conversion of uracil 
bases to pseudouridine. This family includes TruB, a pseudouridyiate synthase that specifically converts uracil 55 to 
pseudouridine in most tRNAs. This family also includes Cbf5p that modifies rRNA [2], 

Number of members: 33 

[2201] 

[1] Nurse K, Wrzesinski J, Bakin A, Lane BG, Ofengand J; Medline: 96079944 Purification, cloning, and properties 
of the tRNA psi 55 synthase from Escherichia coli." RNA 1995;1:102-112. 

[2] Lafontaine DU. Bousquet-Antonelli C, Henry Y, Caizergues-Ferrer M. Tollervey D; Medline: 98139521 The box 
H + ACA snoRNAs carry Cbf5p f the putative rRNA pseudouridine synthase.' Genes Dev 1998;12:527-537. 

[2202] 878. (UDPGP)UTP--glucose-l -phosphate uridylyltransf erase 

This family consists of UTP-glucose-1 -phosphate uridylyltransferases, EC:2.7.7.9. Also known as UDP-glucose py- 
rophosphorylase (UDPGP) and Glucose-1 -phosphate uridylyltransferase. UTP-glucose-1 -phosphate urkjytyltrans- 
ferase catalyses the interconversion of MgUTP + glucose-1 -phosphate and UDP-glucose + MgPPi [1|. UDP-glucose 
is an important intermediate in mammalian carbohydrate interconversion involved in various metabolic roles depending 
on tissue type [1]. In Dictyostelium (slime mold) mutants in this enzyme abort the development cycle [2]. Also within 
the family is UDP-N-acetylglucosamine Swiss:Q16222 or AGX1 [3] and two hypothetical proteins from Borrelia burg- 
dorferi the lyme disease spirochaete Swiss.051893 and Swiss:051036. 

Number of members: 18 

[2203] 

[1] Duggleby RG, Chao YC. Huang JG, Peng HL, Chang HY; Medline: 96202932 Sequence differences between 
human muscle and liver cDNAs lor UDPglucose pyrophosphorylase and kinetic properties of the recombinant 
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Number of members: 15 



[2186] 87Z (Ribosomal_S27) 
Ribosomal protein S27a 

[2187] This family of ribosomal proteins consists mainly of the 40S ribosomal protein S27a which is synthesized as 
a C-terminal extension of ubiquitin (CEP). The S27a domain compromises the C-termtnal half of the protein. The 
synthesis of ribosomal proteins as extensions of ubiquitin promotes their incorporation into nascent ribosomes by a 
transient metabolic stabilization and is required for efficient ribosome biogenesis [3]. The ribosomal extension protein 
S27a contains a basic region that is proposed to form a zinc finger; its fusion gene is proposed as a mechanism to 
maintain a fixed ratio between ubiquitin necessary for degrading proteins and ribosomes a source of proteins [2J. 

Number of members: 36 



[2188] 873. <Spenmine_synth) 
15 Spermine/spermidine synthase 

[2189] Spermine and spermidine are polyamines. This family includes spermidine synthase that catalyses the fifth 
(last) step in the biosynthesis of spermidine from arginine, and spermine synthase. 

Number of members: 39 

20 

[2190] 



[1] Mezquita J, Pau M, Mezquita C; Medline: 97449308 Characterization and expression of two chicken cDNAs 
encoding ubiquitin fused to ribosomaJ proteins of 52 and 80 amino acids." Gene 1997;195:313-319 
& [2] Redman KL, Rechsteiner M; Medline: 89181932 Identification of the long ubiquitin extension as ribosomal 

protei- S27a" Nature 1 989;338:438-440. 

[3] Hnley D, Battel B, Varshavsky A; Medline: 89181925 The tails of ubiquitin precursors are ribosomal proteins 
whose fusion to ubiquitin facilitates ribosome biogenesis." Nature 1 989; 338:394-401 

30 [2191] 874. (Surp)Surp module 

[1] Denhez F. Lafyatis R; Medline: 94266805 Conservation of regulated alternative splicing and identification of func- 
tional domains in vertebrate homologs to the Drosophila splicing regulator. suppressor«>f-white-apricot" J Biol Chem 
1994;269:16170-16179. 

[2192] This domain is also known as the SWAP domain. SWAP stands for Suppressorof-White-APricot It has been 
35 suggested that these domains may be RNA binding [1 ]. 

Number of members: 32 



[21 93] 875. (TFIIE)TFIIE alpha subunit 

The general transcription factor TRIE has an essential role in eukaryotic transcription initiation together with RNA 
polymerase II and other general factors. Human TFIIE consists of two subunits TFIIEnatpha Swiss:P29083 and TFIIE- 
beta Swiss:P29084 and joins the preinitiation complex after RNA polymerase II and TFHF [1]. This family consists of 
the conserved amino terminal region of eukaryotic TFI IE-alpha [2] and proteins from archaebacteria that are presumed 
to be TFIIE-alpha subunits also Swiss:029501 [3]. 

Number of members: 12 



[2194] 

[1J Ohkuma Y, Sumimoto H. Hoffmann A, Shimasaki S, Horikoshi M, Roeder RG; Medline: 92065982 Structural 
motifs and potential sigma homologies in the large subunit of human general transcription factor TFIIE ■ Nature 
1991;354:398-401. 

[2] Ohkuma Y, Hashimoto S, Roeder RG, Horikoshi M; Medline: 93087200 Identification of two large subdomains 
in TFIIE-alpha on the basis of homology between Xenopus and human sequences Nucleic Acids Res 1992 20- 
5838-5838. ' * 

[3] Klenk HR Clayton RA, Tomb JF, White O, Nelson KE, Ketchum KA, Dodson RJ. Gwinn M, Hickey EK, Peterson 
JD, Richardson DL, Kerlavage AR, Graham OE, Kyrpides NC, Fleischmann RD. Quackenbush J, Lee NH Sutton 
GG, Gill S, Kirkness EF. Dougherty BA, McKenney K, Adams MD, Loftus B, Venter JC, et al; Medline 98049343 



EP 1 033 405 A2 



A. Asparaginase 2 

[2212] Asparaginase II (L-asparagine aminohydrolase II) is an extracellular protein that may be associated with the 
cell wall and whose expression is affected by the availability of nitrogen. Asparaginase II catalyzes the reaction of L- 
5 Asparagine + H 2 0 = L-Aspartate + NH 3 . As many leukemias have high requirements for aspartic acid, asparaginase 
II proteins are useful as reagents for screening compounds for activity as leukemia chemotherapy products. Aspara- 
ginase II protein can also be over- or under-expressed to alter amino acid content in plant tissues or to modify nitrogen 
fixation and/or nitrogen metabolism in plants. 

[2213] Ref: Bon et al. (1 997) Appl Biochem Biotechnol 63-65: 203-12 

10 

B. Chloroa b-bind 

[221 4] Chlorophyll a-b binding proteins are located in the thylakoid membranes of the chloroplast and bind chlorophyll 
a and chlorophyll b, thereby triggering a chemical reaction (photosynthesis). These proteins are useful in controlling 
is the rate, efficiency and/or output of photosynthesis. Overexpression of chlorophyll a-b binding proteins is expected to 
increase the rate of photosynthesis. 

Ref: Leutwiler et al. (1986) Nucleic Acids Res 14: 4051-64 Brandt et al. (1992) Plant Mol Biol 19: 699-703 
20 C. DMRL synthase 

[2215] DMRL Synthase (6,7-Dimethyl-8-Ribityllumazine Synthase) catalyzes the last step in riboflavin (Vitamin Bg) 
synthesis, condensing 5-amino-6-(V-D)-ribityl-am!no-2,4(1H, 3H)-Pyrimidinedione with L-3.4-Dihydroxy-2-Butanone 
4-Phosphate producing 6,7-Dimethyl-8-(1-D-Ribityl)Luminazine. The enzyme forms a homopentamer. Engineering of 
25 these proteins or those with homologous sequences/structures may allow control of the amounts of vitamin B2 available 
in plants and/or accumulation of pigment, as well as altering reactions requiring hydrogen ion carriers/transmitters. 
Ref: Garcia-Ramirez et al. (1995) J Biol Chem 270: 23801-7 

D. E1 N 

30 

[2216] These proteins are ATP-dependent DNAhelicases that are required for initiation of viral DN A replication. They 
form a complex with the viral E2 protein. The E1-E2 complex binds to the replication origin that contains binding sites 
for both proteins. The majority of sequences known for this group of proteins are from various papillomaviruses, a type 
of double stranded DNA virus. In plants, the prototype double stranded DNA virus is Cauliflower Mosaic virus (CaMV). 
35 Manipulation of these proteins, especially to produce variant proteins that form non-productive complexes, enables 
production of plants that are resistant to infection by double stranded DNA viruses. 

Ref: Yang et al. (1 993) PN AS USA 90: 5086-90 

Ustav and Stenlund (1991) EM BO J 10: 449-57 
<o Callaway et al. ( 1 996) Mol Plant Microbe Interact 9:810-8 

E. EF1 G 

[2217] Elongation Factor-1 is composed of four sub units: alpha, beta, delta and gamma. Gamma subunits are pre- 
<5 sumed to play a role in anchoring the complex to other cellular components. Studies of EF-1 genes in plants suggests 
that different forms of the EF-1 subunits may be expressed in particular organs or in response to stress. Manipulation 
of the activity of these proteins, either by altered expression level or by structural mutation, may result in the accumu- 
lation of a particular protein in a chosen organ or allow production of particular proteins during stress conditions. 

50 Ref: Kinzy et al. (1994) NAR 22: 2703-7 Dunn et al. (1993) Plant Mol Biol 23: 221-5 Aguilar et al. (1991) Plant Mol 
Biol 17: 351-60 

F. ENV_po!yprotein 

55 [2218] This family comprises the envelope or coat proteins known from a number of different retroviruses. In mam- 
malian species, retroviruses are responsible for diseases such as leukemia and HIV In plants, retroviruses are known 
in both monocot (e.g. Zeon-1 ) and dicot (e.g. Arabidopsis and tobacco) species and have been shown to induce mutant 
alleles at new loci. Engineering of plant ENV proteins may allow mobilization or targeting of endogenous or introduced 
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enzymes expressed in Escherichia coli." Eur J Biochem 1996;235:173-179. 

[2] Ragheb JA, Dottin RP; Medline: 87231075 Structure and sequence of a UDP glucose pyrophosphorylase gene 
of Dictyostelium discoideum." Nucleic Acids Res 1987;15:3891-3906. 

[3] MioT, Yabe T, Arisawa M, Yamada-Okabe H; Medline: 98269105 The eukaryotic UDP-N-acetylglucosamine 
pyrophosphorylases. Gene cloning, protein expression, and catalytic mechanism J Biol Chem 1998573 
14392-14397. 

{2204] 879. (UPF004) Uncharacterized protein family UPF0044 signatureCross-reference(s) PS01 301 UPF0044 
T^e following uncharacterized proteins have been shown [1 J to be highlysimilar: - Bacillus subtilis hypothetical protein 

- Escherichia coli hypothetical protein yhbY and HI1333, the corresponding Haemophilus influenzae protein. 
Methanococcus jannaschii hypothetical protein MJ0652. 

These are small proteins of 10 to 15 Kd. They can be picked up in the database by the following pattern This pattern 
is located in the N-terminal part of these proteins. 

[2205] Consensus pattern: L-{ST]-x(3)-K-x(3^[KRHSGA]-x-[GAJ-H-x-L-x-P-{U^-x(2)- [UV]-[GA]-x(2)-G Sequenc- 
es known to belong to this class detected by the pattern ALL Other sequences) detected in SW1SS-PROTNONE 
[2206] 880. (zf-A20)A20-iike zinc fingerA20- (an inhibitor of cell death)-like zinc fingers. The zincfinger mediates self- 
association in A20. These fingers alsomediate IL-1 -induced NF-kappa B activation. 

Number of members: 22 

[2207] 

[1] Heyninck K, Beyaert R; Medline: 99126071 The cytokine-inducible zinc finger protein A20 inhibits IL-1-induced 
NF- kappaB activation at the level of TRAF6. FEBS Lett 1999;442:147-150. 

[2] De ValckD, Heyninck K, Van Criekinge W, Contreras R,Beyaert R, Rers W; Medline: 96390831 A20 t an inhibitor 
of cell death, self-associates by its zinc finger domain." FEBS Lett 1 996;384:61 -64. 

[3] Song HY, Rothe M, Goeddel DV; Medline: 96270609 The tumor necrosis factor-inducible zinc finger protein 
A20 interacts with TRAF1/TRAF2 and inhibits NF-kappaB activation. Proc Natl Acad Sci U S A 1 996 93 6721 -6725 
[4] Opipari AW Jr, Boguski MS, Dixit VM; Medline: 90368626 The A20 cDNA induced by tumor necrosis factor 
alpha encodes a novel type of zinc finger protein." J Biol Chem 1990;265:14705-14708. 

[2208] 881. (zf-PARP) 

Poly(ADP-ribose) polymerase zinc finger domain 

Cross-reference(s) PS00347; PARP_ZN_FINGER_1 PS50064; PARP_ZN_FINGER_2 

[2209] Pory(ADP-ribose) polymerase (EC 2.4.2.30) (PARP) [1,2] is a eukaryotic enzyme that catalyzes the covalent 
attachment of ADP-ribose units from NAD(+) to various nuclear acceptor proteins. This post-translational modification 
of nuclear proteins is dependent on DNA It appears to be involved in the regulation of various important cellular proc- 
esses such as differentiation, proliferation and tumor transformation as well as in the regulation of the molecular events 
involved m the recovery of the cell from DNA damage. Structurally. PARR about 1000 amino-acids residues long 
consists of three distinct domains: an N-terminal zinc-dependent DNA-binding domain, a central automodification do^ 
mam and a C-terminal NAD-binding domain. The DNA-binding region contains a pair of zinc finger domains which 
have been shown to bind DNA in a zinc<Jependent manner. The zinc finger domains of PARP seem to bind specifically 
to single-stranded DNA. DNA ligase III [3] contains, in its N-terminal section, a single copy of a zinc finger high V similar 
to those of PARP. 3 ' 

[2210] Consensus pattern: C-[KR]-x^C-x(3)-l-x-K-x(3)-{RG].x(16.18)-W-[FYH]-H-x(2)-C fThe three Cs and the H are 
zinc ligands] Sequences known to belong to this class detected by the pattemALL Other sequence(s) detected in 
SWI SS-PROTNONE. Sequences known to belong to this class detected by the profile ALL Other sequence(s) detected 
in SWISS-PROTNONE. 

[221 1] Note: This documentation entry is linked to both signature patterns and a profile. As the profile is much more 
sensitive than the patterns, you should use it if you have access to the necessary software tools to do so. 

[ 1] Althaus F.R., Richter C.R. Mol. Biol. Biochem. Biophys. 37:1-126(1987). 

[ 2] de Murcia G., Menissier de Murcia J. Trends Biochem. ScL 19:172-176(1994). 

[ 3] Wei Y.-F.. Robins P., Carter K., Caldecott K., Pappin D.J.C.. Yu G.-L. Wang R.-P, Shell B.K., Nash R A. Schar 
P., Barnes D.E., Haseltine W.A., Lindahl T. Mol. Cell. Biol. 15:3206-3216(1995). 
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to be regulated by circadian rhythms and in a light dependent manner, occurring at higher levels in roots, for example 
and lower levels in light-grown tissues such as cotyledons. Generally, HMG proteins are thought to influence transcrip- 
tion regulation. In plants, HMGs are believed to have a role in maintaining patterns of ctrcadian-regulated expression 
for other genes, suggesting that these proteins could be exploited to control growth and development. 

5 

Ref: Laudetetal. (1993) Nucleic Acids Res 21: 2493-501 Zheng etal. (1993) Plant Mol Biol 23: 813-23Grasseret 
al. (1993) Plant Mol Biol 23: 619-25 

L. IL2 

10 

[2224] lnterleukin-2 (IL-2)is produced in mammals by T cells in response to antigenic or myogenic stimulation and 
is crucial for proper regulation and functioning of the immune response. IL-2 is capable of stimulating B cells, monocytes, 
lymphokine-activated killer cells, natural killer cells and glioma cells. Plant extracts have also been shown to stimulate 
the immune system (for example, mistletoe therapy for human cancer). It is known that IL-2 is involved in feedback 
1$ inhibition pathways that impact the inflammatory response as well as the growth inhibition of tumor reactive T cells. 
Plant proteins containing IL-2-like sequences are useful as immunity-based therapeutics, acting in a manner similar 
to IL-2 in mammals. 

Ref: Heike et al. (1997) Scand J Immunol 45: 221-6 Ariel et aL (1998) J Immunol 161: 2465-72 Schink (1997) An- 
20 ticancer Drugs 8 Suppl 1 : S47-51 

M. Oxidored FMN 

[2225] NADPH dehydrogenases catalyze the reaction NADPH + acceptor = NADP(+) + reduced acceptor. One mem- 
25 ber of this family is yeast old yellow enzyme' (OYE) and is thought to be involved in oxylipin metabolism. A second 
yeast family member is a protein that binds estrogen binding protein (EBP) in addition to exhibiting oxidoreductase 
activity. An Arabidopsis homoiog to OYE has been described and estrogen binding proteins in plants have been re- 
ported. Plant proteins from this class have the potential to be used to modify lipid metabolism/catabolism. These pro- 
teins may also have use as therapeutics for breast and prostate cancer, and other abnormal growth in steroid-sensitive 
30 tissues. 

Ref: Baker etal. (1998) Proc Soc Exp Biol Med 217: 317-21 Schaller and Weiler (1997) J Biol Chem 272: 28066-72 
Mandani et al. (1994) PNAS USA 91: 922-6 

& N. Oxidored q2 

[2226] The NADH-plastoquinone oxidoreductases catalyze the reaction NADH + plastoquinone = NAD(+) + plasto- 
quinol. In plants these reactions occur in the chloroplast and are believed to participate in a chloroplast respiratory 
system. Here, the NDH complex is postulated to act as a valve to remove excess reduction equivalents in the chloro- 
40 plasts. Manipulation of these proteins may improve the rate or efficiency of photosynthesis. 

Ref: Burrows et al. (1998) EMBO J 17: 868-76 Kofer et al (1998) Mol Gen Genet 258: 166-73 Maier et al. (1995) J 
Mol Biol 251:614-28 

<s O. PABP 

[2227] Polyadenylate binding proteins bind the poly (A) tail of mRN A. Plants, as exemplified by Arabidopsis. contain 
numerous PABP genes that are expressed in an organ-specific manner. For example, PABP2 is functional in roots and 
shoots, while PABP5 is expressed predominantly in immature flowers. The PABP proteins are implicated in numerous 
so aspects of posttranscriptional regulation including mRNA turnover and transitional initiation. Control of activity of PABP 
proteins provides the ability to control the expression of various genes in particular organs during development. 

Ref: Hilson et al (1 993) Plant Physiol 1 03: 525-33 Betostotsky and Meagher (1 993) PNAS USA 90: 6686-90 

55 P Parvo coat 

[2228] Parvoviruses are linear single-stranded ONA viruses that are encapsulated by three capsid proteins. Plants 
are susceptible to infection by single stranded DNA viruses such as Maize streak virus (MSV) and various Gemini 
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retroviruses, in essence generating a new method tor mutant production, gene tagging and the like. 

Ref: Mamoun et al (1 990) J Virol 64: 4160-8 Grandbastien et al. (1 989) Nature 337: 376-80 Wriqht and Vovtas (1 9981 
Genetics 149:703-15 

G. Glycosvl hvdr9 

[2219] Proteins having this domain (previously known as the glycosvl hydrolase family 5 domain) catalyze the en- 
dohydrolysis of 1,4-p-D-glucoskJic linkages in cellulose. Numerous plant proteins with this domain exist and are ex- 
pressed in an organ specific manner. They are involved in the fruit ripening process, in cell elongation and plant re- 
production. Modulation of the activity of these proteins, either by over- or under-expression or by mutation of the 
pdypeptKJa. could be used to affect post-harvest physiology (e.g. rate of ripening) or for engineering reproductive 

Ref: Giorda et al. (1990) Biochemistry 29: 7264-9 Tucker et al. (1 988) Rant Physiol 88:1257-62 Shani et al (1997) 
43: 837-42 Milligan and Gasser (1995) Plant Mol Biol 28: 691-711 

H. Glycosvl hvdr14 

[2220] The p-amylases (family 14 of glycosyl hydrolases) catalyze the hydrolysis of 1 ,4-a-glucosidic finkages in 
^saccharides and remove successive maltose units from the non-reducing ends of the chains. Mutants of B-amylase 
in Arabidopsis exh.bited altered degradation of starch throughout the diurnal cycle. In addition, the mutant phenotypes 
indicated that these enzymes not only affect carbohydrate metabolism/catabolism. but also influence the amount of 
pigment stored wrthin particular cells. Manipulation of the p-amylase genes enables control of plant pigmentation (for 
example, fibre pigment in cotton) as well as carbohydrate synthesis and degradation. 

Ref: Zeeman et aL (1998) Plant J 15: 357-65 Hirano and Nakamura (1997) Plant Physiol 114: 5675-82 Kitamoto et 
al. (1988) J Bacteriol 170: 5848-54 

I. Glycosvl hydr15 

[2221] Glycosyl hydrolases from family 1 5 (such as 1 ,4-Alpha-D-Glucan glucohydrolase.) catalyze the hydrolysis of 
terminal M-hnked alpha-D-glucose residues successively from the non-reducing ends of the chains resulting in the 
release of p-D-Glucose. In plants these proteins have been tied to the mobilization of the xyloglucan stored in the 
cotyledonary cell walls. Proteins such as these could be varied to affect the rate of plant growth (for example durinq 
germination), storage and/br use of glucose and other sugars by plant tissues and alteration of the properties, such 
as elasticity, of plant cell walls. 

Ref: Crombie et al. (1998) Plant J 15: 27-38 Hata et al. (1991) Agric Biol Chem 55: 941-9 
J. Glycosvl hydr20 

[2222] Members of the family 20 glycosyl hydrolases catalyze the hydrolysis of terminal non-reducing N^cetly-D- 
hexosarmne residues in N-acetyl-p-Ohexosaminides. N-acetyl-fHjlucosaminidase belongs to this family and exists in 
several different forms (consisting of various combinations of alpha and beta chains) depending on the organism 
Family 20 glycosyl hydrolases have been implicated in lysosomal storage diseases (such as Sandhoff disease) and 
glycogen storage disease in humans. These types of proteins are also responsible for the hydrolysis of chitin In plants 
these proteins could be useful in controlling carbohydrate catabolism. thereby influencing the amount of sugars avail- 
able for storage and/or use in other metabolic pathways. In addition, it is possible that such proteins could be used to 
engineer an endogenous insect protection mechanism, e.g. by secretion of a chitin-hydrolyzing composition by the 

Ref: Graham et al (1988) J Biol Chem 263: 16823-9 aDowd et al. (1988) Biochemistry 27: 5216-26 
K. HMG box 

[2223] The HMG box is a novel type of DNA-bmding domain found in a diverse group of proteins. Numerous plant 
proteins contain this domain, such as the HMG1/2-like proteins. The expression of some of these HMG proteins appears 
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U. Signal 

[2233] Many plant proteins in this family contain sequences similar to those found in both components of the prokary- 
otic family of signal transducers known as the two-component systems. This suggests that activation may require a 
5 transfer of a phosphate group between the transmitter domain and the receiver domain. One family member in Arabi- 
dopsis appears to be involved in ethylene (a plant hormone) signal transduction. Other proteins in this family appear 
to be involved in the regulation of gene transcription under conditions of environmental stress. Signal proteins can be 
exploited to affect plant growth and development and/or control plant responses to stress conditions such as cold, 
nutrient availability, etc. 

10 

Ref: Chang et a!. (1993) Science 262: 539-44 Nagaya et al. (1993) Gene 131: 119-124 Gottfert et aL (1990) PNAS 
USA 87: 2680-4 

V. vMSA 

15 

[2234] vMSA proteins are major surface antigens presenting on the envelope of various retroviruses. Surface anti- 
gens of retroviruses are often involved in tropism of the virus. Plants contain retrovirus-like viruses such as pararetro- 
vi ruses and retrotransposons (i.e. transposons having long terminal repeats). Plant retrotransposons in particular have 
been used to create mutants at various loci, thereby permitting gene isolation, gene tagging and the like. Manipulation 
20 of plant vMSA proteins enables control of tropism of plant retroviruses that might be used for genetic engineering tools, 
thus enabling targeting of the virus to particular species and/or tissues of plants. 

Ref: Okamoto et al. (1 988) J Gen Virol 69: 2575-83 Grandbastien et al. (1 989) Nature 337: 376-80 Wright and Voytas 
(1998) Genetics 149: 703-15 

25 

W. zf-CCCH 

[2235] This family of proteins is defined by having two CX(8)CX(5)CX(3)H-type zinc finger domains. These proteins 
cover a broad range of functions. For example, the COP1 protein acts as a repressor of photomorpnogenesis in dark- 

30 ness; light stimuli abolish this suppressive action. In addition, COP1 protein can function as a negative transcriptional 
regulator capable of direct interaction with components of the G -protein signaling pathway. As a second example, a 
zf-CCCH protein identified in Arabidopsis appears to be involved in the resistance to ONA damage induced by UV light 
and chemical DNA-damaging agents. Overexpression of this class of proteins permits production of plants that are 
better suited to adverse environments. Manipulation of expression of zf-CCCH proteins functioning as transcriptional 

35 regulators, such as COP1 , enables manipulation of some signal transduction pathways. 

Ref: Pang et at (1993) Nucleic Acids Res 21: 1647-53 Deng et al. (1992) Cell 71: 791-801 
X. zf-RanBP 

40 

[2236] Proteins falling within this category contain many X-X-F-G and X-F-X-F-G repeats, and may contain 
RANBPHike or PPIase domains. Plant proteins having domains similar to these include PAS1 and GMSTI. PAS1 has 
been shown to have dramatic developmental affects that appear to be correlated with both cell division and cell wall 
elongation. GMSTI has high identity to the yeast STI stress-inducible gene and has been shown to be heat inducible. 
45 Proteins such as these may be useful for controlling growth and form of development. 

Ref: Vittorioso et al. (1 998) Mol Cell Biol 1 8: 3034-43 Hernandez Torres et al. (1 995) 27: 1 221 -6 

Y. Peptidase M48. 

so 

[2237] Proteins belonging to this peptidase family are metalloproteases that bind zinc as a cofactor and are located 
in the membranes of the endoplasmic reticulum. They function in NF^-terminal proteolytic processing, as shown for 
the yeast STE24 gene product. This gene is required for the correct processing of a-factor, a yeast pheromone. Family 
M48 peptidases also appear to be required for some prenylation reactions, mediating COOH-terminal CAAX process- 
55 ing. Prenylation reactions are believed to be involved in the regulation of protein -protein and protein-membrane inter- 
actions. As an example, RAS GTPase activity is regulated in part by localization to the inner side of the plasma mem- 
brane upon prenylation. In plants, proteins from this family could be involved in pollen-stigma interactions such as 
those mediating self-pollenation vs. outcrossing, or could be members of several secondary metabolism pathways. 
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nZ? I * ,he8e P]anl VifUSeS are Critical to * e virus li,e cvcle w « hi " plant For example the coat 

protem of MSV . thought to be invoked in intra- and inter^eHular movement within the plL EngTneeri™ of' o^l 

ZETV 2 pan ' ovira, pro,ehs - especially ,0 produce pro,eins *« in,erfer ' ^SSSSSSl 

part K .e. enables the production of plants having better resistance to natural plant sing^trand^ DNA^sel 
Ref: Liu at al. (1997) J Gen Virol 78: 1265-70 Rohde et al. (1990) Virology 176: 648-51 
Q. Pkinase C 

[2229] Plant serine/threonine protein kinases possessing this domain are expressed in all tissues and are known i„ 

.ng i development, these protems predorraiate during high metabolic activity in growing buds root tips leaf II 
and germ.nat.ng seeds. They are thought to be involved in the contro. of pL grov^hL deveC^nt^a^Z 
two genes encoding proteins from this family have been described that h* plant ce.teTctep^unTcS £ hSS 
s resses. Consequently, engineering Pkinase C proteins provides a way to control gane5aS2!Z2^^ ,T 
plan, as we., as a means to provide endogenous protection against o^o^Z^ 

Ret Zhang et al. (1994) J Biol Chem 269: 17586-92 Mizoguchi et al. (1995) FEBS Lett 358: 199-204 



R. REV 



[2230] The REV proteins act post-transcriptfonally to relieve negative repression of GAG and ENV oroduction in 
retrov.ruses such as Human Immounodeficiency Virus type I (HIV^I) Plants contain reTr^m?!! . product ™ ,n 

ESTT and r otransposons (ie - ^S^yS^^T^ 

tV^pIp' (1 1 86 > NatUre 321: 412 - 7 F «n*ini * aL (1989) PNAS USA 86: 2433-7 Marquet et al (1995) 
77. 1 13-24 Grandbast.en et al. (1989) Nature 337: 376-80 Wright and Voytas (1998) GenetiS T« tjolZ 



Ref: 

S. RuBisCo small 



Ref: Giuliano et al. (1988) PNAS USA 85: 7089-93 Dedonder et al. (1993) Plant Physiol 101: 801-8 
T. Sialvltransf 

S reScT* * ^ CM ™- ace *" c «^ family catatyze the fol- 

S « u ",r ^ - P«~ <• P««*r c**, locations or enables Ih. ™„n g « oj^",, J* 

"* SSSr — - J - O - » ,003,-3, M ^ 



301 



EP 1 033 405 A2 



plants, these proteins are located in the thylakokJ membranes of the chloroptasts, their expression is light regulated 
and they are thought to be involved in degradation of soluble stromal proteins and turn-over of thylkoid proteins [1]. 
Manipulation of expression and structure of these proteins would have effects on the efficiency of photosynthesis and 
the development of chloroplasts. 
[2246] Refs 

1 Lindahl M, Tabak s, Cseke L. Pichersky E, Andersson B, Adam Z (1996) J Biol Chem 271: 29329-34. 
Ae. UPF0051 

[2247] There is some evidence that, in plants, proteins in this family are involved in ATP synthesis in chloroplasts 
[1,2]. Mutations in these proteins or altering their expression would affect the efficiency of photosynthesis and energy 
production. 
[2246] Refs 

1 Kostrzewa M, Zetsche K (1992) J Mol Biol 227: 961-70. 

2 Kostrzewa M ( Zetsche K (1993) Plant Mol Biol 23: 67-76 

Af. E7 

[2249] Papillomaviruses are encapsulated double stranded DNA viruses. The Papillomavirus early protein 7 (E7) is 
known as a potent immortalizing and transforming agent. Transformation by E7 is thought to be mediated by the physical 
association of E7 with cellular proteins regulating entry into the cell cycle [1]. The result is entry into the cell cycle and 
suppression of terminal differentiation in mammalian cells. Thus, engineering of proteins having similarity to papillo- 
mavirus E7 protein enables the production of plants having altered cellular proliferation characteristics and possibly 
altered morphology. For example, overexpression of E7-like proteins would be expected to result in proliferation of 
cells of the tissue in which the E7 protein is expressed, perhaps with suppression of differentiation events. Thus, for 
example, overexpression of E7-like proteins in meristem cells can result in taller plants and suppression of leafing and/ 
or flowering. 
[2250] Refs 

1 Zwerschke W, Jansen-Durr P Adv Cancer Res 2000;78:1-29 
Ag. Peptidase U7 

[2251] This protein is known to be an integral membrane protein in the cyanobacterium Synechocystis where it 
functions to digest cleaved signal peptides [1 J. This activity is necessary to maintain proper secretion of mature proteins 
across the membrane. In higher plants this protein may be present in the plastid or chloroplast membranes where it 
would function by enabling protein movement into and out of the chloroplasts. Mutations in this protein would be ex- 
pected to affect the development of plastkJs, including chloroplasts, or alter the energy transfer system within the 
chloroplasts, thereby affecting growth and development. 
[2252] Refs 

1 KanekoT, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y. Miyajima N. Hirosawa M, Sugiura M, Sasamoto S, 
Kimura T, Hosouchi T. Matsuno A. Muraki A, Nakazaki N, Naruo K, Okumura S, Shimpo S, Takeuchi C, Wada T, 
Watanabe A, Yamada M, Yasuda M, Tabata S (1996) DNA Res 3:109-36. 

A. Activities of Polypeptides Comprising Signal Peptides 

[2253] Polypeptides comprising signal peptides are a family of proteins that are typically targeted to (1 ) a particular 
organelle or intracellular compartment, (2) interact with a particular molecule or (3) for secretion outside of a host cell. 
Example of polypeptides comprising signal peptides include, without limitation, secreted proteins, soluble proteins, 
receptors, proteins retained in the ER. etc. 

[2254] These proteins comprising signal peptides are useful to modulate ligand-receptor interactions, cell-to-cell 
communication, signal transduction, intracellular communication, and activities and/or chemical cascades that take 
part in an organism outside or within of any particular cell. 

[2255] One class of such proteins are soluble proteins which are transported out of the cell. These proteins can act 
as ligands that bind to receptor to trigger signal transduction or to permit communication between cells. 
[2256] Another class is receptor proteins which also comprise a retention domain that lodges the receptor protein in 
the membrane when the cell transports the receptor to the surface of the cell. Like the soluble ligands, receptors can 
also modulate signal transduction and communication between ceils. 
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Ref: Fujimura-Kamada et al. (1997) J Cell Biol. 136: 271-85. Tarn et al. (1998) J Cell Biol. 142: 635-49. 
Z. DNA Pol Viral Kl 

[2238] The DNA pol Viral N domain is located at the N-terminal region of DNA polymerase isolated from several 
retro.d v.ruses such as the Cauliflower Mosaic V,rus. The domain mot« has also been toumJ in numerous oS s^s 
from humans to cyanobac.era. In these organisms, this motif seems to be associated with two of sle^i 
retrotransposons and rnitochondrial genes. In the mitochondria, sequences this domain is poJSTiSSdl!^ 

SSSST"!? * 9 T ", intr0nS - ^ man ^™ - «• h P^ts a.lowTJrZ;:n^ 
retrotransposons endogenous to plant genomes wallows engineering tfmto^ 

efficiency of energy utilization by cells. <*>pw.idny io increase 

RER J^*™ 3 "^^ Wilsonetal (1994) 

JSrS? °T e " " aL 0994) 2 ^ 65MS Qaa * er * aL < 1981 > NAR * 2871-2888 C^ngs et^ 
(1990) Curr Genet 17: 375^102 Hattori et al. (1986) Nature 321: 625-8 ""timings et al. 

Aa. Calpain inhib 

[2239] Thisdomain is found in calpastatin, an inhibitor protein specific for calpain. Calpain is a non-lysosomal calcium 
dependent intracellular protease that appears to be involved in the dynamic^angeTof i SSSTSSSl 
actm-related structures, during early Owc^embryooenesteM 

subcellular distribute of ca.pastatin is thought to be important to ca^ regu^2 £Z££E^£ 
r n ^ TOU,d , b 2r 0,Ved and "oncogenic organ 

•rtMMor repeat domains wouH produce developmental abnormal^ such as abnormal leaf, root or flZ dS£ 
[2240] Refs 

1 Emori Yand Saigo K (1994) J Biol Chem 269: 25137-4£ 

2 Mellgren RL. Lane RD, Mericle MT (1989) Btochim Biophys Acta 999: 71-77! 

Ab. chorismate bind 

[2241] Chorismate binding domains are present in plant anthranilate synthase (AS) genes AS genes catalvze the 

%^££2T* " ,,yPt0phan h by **«— and Limine iL^iJj^S^ 

glutarnate Some of these genes are involved in feedback inhibition by tryptophan [1] while some are feedback insLn 
siuve [2}jn Arabidopsis, two AS genes have overlapping, but different dSrtoutions ^S^SSSSS 

SSTn, COU ' d rtluence *° pianVs defense «**••"• AS 9 en * Products can be used for in vitwsZ 

thesis of tryptophan and tryptophan derivatives «wwosyn 
[2242] Refs 

1 Niyogi KK, Fink GR (1992) Plant Cell 4: 721-33. 

2 Song HS, Brotherton JE, Gonzales RA, Wilholm JM (1998) Plant Physiol 117:533-43. 
Ac. late protein L2 

SnH f 2 > nST aVirUSeS are u encapsu,atea ***** stranded DNA viruses. Plants are susceptible to infection by dou- 

to the v rus Me cycle within the plant. For example, the coat protein of CaMV is thought to be invSTn w« S 
^enuar^ernent within the plant [1], Engineering of proteins having simiiarfty to papilkJ^fwoteiS 
Si i ^ 0< Plan,S ^ ^ feSiS,anCe 10 na ' Ufa ' »*■* d0Ubl * DNA vTrusT 

1 Thompson SR t Melcher U (1993) J Gen Virol 74: 1141-8. 

Ad. Peptidase M41 

[2245] Proteins belonging to this peptidase family are metaltoproteases that bind zinc as a cefaclor and are intearal 
membrane proteins. They seem to be involved in me degradation of carboxy-termina.-tagged cy.op.asm proteiSS 
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strand of RNA. For plant cells, antisense RNA inhibits gene expression by preventing the accumulation of mRN A which \ 
encodes the enzyme of interest, see, e.g.. Sheeny et al., Proc. Nat. Acad. Sci USA, 85:8805 (1988), and Hiatt et al., 
U.S. Patent No. 4,801,340. 

s lll.A.2. Ribozvmes 

[2270] Similarly, ribozyme constructs can be transformed into a plant to cleave mRN A and down-regulate translation. 
HI.A.3. Co-Suppression 

*° - 
[2271] Another method of suppressbn is by introducing an exogenous copy of the gene to be suppressed. Introduc- p<=: 
tton of expression cassettes in which a nucleic acid is configured in the sense orientation with respect to the promoter 
has been shown to prevent the accumulation of mRNA. A detailed description of this method is described above. 

is II I A. 4. Insertion of Sequences into the Gene to be Modulated 

[2272] Yet another means of suppressing gene expression is to insert a polynucleotide j n to the gene of interest to 

disrupt transcription or translation of the gene. ^ 

[2273] Homologous recombination could be used to target a polynucleotide insert to a gene using the Cre-Lox system %*> 

20 (A.C. Vergunst et al., Nucleic Acids Res. 26:2729 (1 998), A.C. Vergunst et al., Plant Mot. Biol. 38:393 (1 998), H. Albert p 
et al., Plant J. 7:649 (1995)). p 
[2274] In addition, random insertion of polynucleotides into a host cell genome can also be used to disrupt the gene 
of interest. Azpiroz-Leehan et al.. Trends in Genetics 13:1 52 (1997). In this method, screening for clones from a library 
containing random insertions is preferred for identifying those that have polynucleotides inserted into the gene of in- 

25 terest Such screening can be performed using probes and/or primers described above based on sequences from REF ; 
AND SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto. The screening can also 
be performed by selecting clones or any transgenic plants having a desired phenotype. 

III.A.5. Regulatory SequenceModulation ! 

30 

[2275] The SDFs described in REF and SEQ TABLES 1 and 2, and fragments thereof are examples of nucleotides 
of the invention that contain regulatory sequences that can be used to suppress or inactivate transcription and/or 
translation from a gene of interest as discussed in I.C.5. 

35 III .A. 6. Genes Comprising Dominant-Negative Mutations 

[2276] When suppression of production of the endogenous, native protein is desired it is often helpful to express a 
gene comprising a dominant negative mutation. Production of protein variants produced from genes comprising dom- 
inant negative mutations is a useful tool for research Genes comprising dominant negative mutations can produce a 

40 variant polypeptide which is capable of competing with the native polypeptide, but which does not produce the native i 
result Consequently, over expression of genes comprising these mutations can titrate out an undesired activity of the 
native protein. For example, The product from a gene comprising a dominant negative mutation of a receptor can be 
used to constitutively activate or suppress a signal transduction cascade, allowing examination of the phenotype and 
thus the trait(s) controlled by that receptor and pathway Alternatively, the protein arising from the gene comprising a 

^5 dominant-negative mutation can be an inactive enzyme still capable of binding to the same substrate as the native 
protein and therefore competes with such native protein. 

[2277] Products from genes comprising dominant-negative mutations can also act upon the native protein itself to 
prevent activity. For example, the native protein may be active only as a homo-multimer or as one subunit of a hetero- 
multimer. Incorporation of an inactive subunit into the multimer with native subunit(s) can inhibit activity. 
so [2278] Thus, gene function can be modulated in host ceils of interest by insertion into these cells vector constructs 
comprising a gene comprising a dominant-negative mutation. 

III.B. Enhanced Expression 



[2279] Enhanced expression of a gene of interest in a host cell can be accomplished by either (1) insertion of an 
exogenous gene; or (2) promoter modulation. 
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[2257] In addition the signal peptide itself can serve as a ligand for some receptors. An example is the interaction of 
the ER targeting signal peptide with the signal recognition particle (SRP). Here, the SRP binds to the signal peptide 
halting translation, and the resulting SRP complex then binds to docking proteins located on the surface of the ER 
prompting transfer of the protein into the ER 

[2258] A description of signal peptide residue composition is described below in Subsection IVC.1. 
III. Methods of Modulating Polypeptide Production 

[2259] It is contemplated that polynucleotides of the invention can be incorporated into a host cell or in-vitro system 
to modulate polypeptide production. For instance, the SDFs prepared as described herein can be used to prepare 
expression cassettes useful in a number of techniques for suppressing or enhancing expression. 
[2260] An example are polynucleotides comprising sequences to be transcribed, such as coding sequences of the 
present invention can be inserted into nucleic acid constructs to modulate polypeptide production. Typically such se- 
quences to be transcribed are heterologous to at least one element of the nucleic acid construct to generate a chimeric 
is gene or construct. 

[2261] Another example of useful polynucleotides are nucleic acid molecules comprising regulatory sequences of 
the present invention. Chimeric genes or constructs can be generated when the regulatory sequences of the invention 
linked to heterologous sequences in a vector construct. Within the scope of invention are such chimeric gens anoVbr 
constructs. 



w 



20 



25 



[2262] Also within the scope of the invention are nucleic acid molecules, whereof at least a part or fragment of these 
DNA molecules are presented in REF AND SEQ TABLES 1 AND 2 of the present application, and wherein the coding 
sequence is under the control of its own promoter and/or its own regulatory elements. Such molecules are useful for 
paction 19 96n0me ° f 3 hOSt Ce " ° f 30 or9anism re 9enerated from said host cell for modulating polypeptide 

[2263] Additionally, a vector capable of producing the oligonucleotide can be inserted into the host cell to deliver the 
oligonucleotide. 

[2264] More detailed description of components to be included in vector constructs are described both above and 
below. 

[2265] Whether the chimeric vectors or native nucleic acids are utilized, such polynucleotides can be incorporated 
30 intoa host cell to modulate polypeptide production. Native genes andfor nucleic acid molecules can be effective when 
exogenous to the host cell. 

[2266] Methods of modulating polypeptide expression includes, without limitation: 
Suppression methods, such as 

35 Antisense 
Ribozymes 
Co-suppression 

Insertion of Sequences into the Gene to be Modulated 
Regulatory Sequence Modulation. 

40 

as well as Methods for Enhancing Production, such as 

Insertion of Exogenous Sequences; and 
Regulatory Sequence Modulation. 

45 

III.A Suppression 

[2267] Expression cassettes of the invention can be used to suppress expression of endogenous genes which com- 
prise the SDF sequence. Inhibiting expression can be useful, for instance, to tailor the ripening characteristics of a fruit 
(Oeller et aL. Science 254:437 (1 991 )) or to influence seed size (WO98/07842) or to provoke cell ablation (Mariani et 
al.. Nature 357: 384-387 (1992). 

[2268] As described above, a number of methods can be used to inhibit gene expression in plants such as antisense 
nbozyme, introduction of exogenous genes into a host cell, insertion of a polynucleotide sequence into the coding 
sequence and/or the promoter of the endogenous gene of interest, and the like. 

III.A. 1 . Antisense 

[2269] An expression cassette as described above can be transformed into host cell or plant to produce an antisense 



so 
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icalty included. The potyadenylation region can be derived from the natural gene, from a variety of other plant genes, 
or from T-OIMA. 

[2292] The vector comprising the sequences from genes or SDF or the invention may comprise a marker gene that 
confers a selectable phenotype on plant cells. The vector can include promoter and coding sequence, for instance. 
5 For example, the marker may encode bkxide resistance, particularfy antibiotic resistance, such as resistance to kan- 
amycin, G41 8, bleomycin, hygromycin. or herbicide resistance, such as resistance to chtorosutf uron or phosphinotricin. 

IV.A. Coding Sequences 

io [2293] Generally, the sequence in the transformation vector and to be introduced into the genome of the host cell 
does not need to be absolutely identical to an SDF of the present invention. Also, it is not necessary for it to be full 
length, relative to either the primary transcription product or fully processed mRNA. Furthermore, the introduced se- 
quence need not have the same intron or exon pattern as a native gene. Also, heterologous non-coding segments can 
be incorporated into the coding sequence without changing the desired amino acid sequence of the polypeptide to be 

is produced. 

IV.B. Promoters 

[2294] As explained above, introducing an exogenous SDF from the same species or an orthologous SDF from 
20 another species can modulate the expression of a native gene corresponding to that SDF of interest. Such an SDF 
construct can be under the control of either a constitutive promoter or a highly regulated inducible promoter (e.g., a 
copper inducible promoter). The promoter of interest can initially be either endogenous or heterologous to the species 
in question. When re-introduced into the genome of said species, such promoter becomes exogenous to said species. 
Over-expression of an SDF transgene can lead to co-suppression of the homologous endogeneous sequence thereby 
2$ creating some alterations in the phenotypes of the transformed species as demonstrated by similar analysis of the 
chalcone synthase gene (Napoli et a)., Plant Cell 2:279 (1990) and van der Krol et al., Plant Cell 2:291 (1990)). If an 
SDF is found to encode a protein with desirable characteristics, its over-production can be controlled so that its accu- 
mulation can be manipulated in an organ- or tissue-specific manner utilizing a promoter having such specificity. 
[2295] Likewise, if the promoter of an SDF (or an SDF that includes a promoter) is found to be tissue-specific or 
30 developmental^ regulated, such a promoter can be utilized to drive or facilitate the transcription of a specific gene of 
interest (e.g., seed storage protein or root-specific protein). Thus, the level of accumulation of a particular protein can 
be manipulated or its spatial localization in an organ- or tissue- specific manner can be altered. 

IV. C Signal Peptides 

35 

[2296] SDFs of the present invention containing signal peptides are indicated in the REF and SEQ TABLES. In some 
cases it may be desirable for the protein encoded by an introduced exogenous or orthologous SDF to be targeted (1 ) 
to a particular organelle intracellular compartment, (2) to interact with a particular molecule such as a membrane mol- 
ecule or (3) for secretion outside of the cell harboring the introduced SDF. This will be accomplished using a signal 
40 peptide. 

[2297] Signal peptides direct protein targeting, are involved in ligand-receptor interactions and act in cell to cell 
communication. Many proteins, especially soluble proteins, contain a signal peptide that targets the protein to one of 
several different intracellular compartments. In plants, these compartments include, but are not limited to, the endo- 
plasmic reticulum (ER), mitochondria, plastids (such as chloroplasts), the vacuole, the Golgi apparatus, protein storage 

45 vessicles (PSV) and, in general, membranes. Some signal peptide sequences are conserved, such as the Asn-Pro- 
Jle-Arg amino acid motif found in the N-terminal propeptide signal that targets proteins to the vacuole (Marty (1999) 
The Plant Oe//1 1 : 587-599). Other signal peptides do not have a consensus sequence perse, but are largely composed 
of hydrophobic amino acids, such as those signal peptides targeting proteins to the ER (Vitale and Denecke (1999) 
77ie Plant Cell 11:61 5-628). Still others do not appear to contain either a consensus sequence or an identified common 

so secondary sequence, for instance the chloroplast stromal targeting signal peptides (Keegstra and Cline (1 999) The 
Plant Cell 1 1 : 557-570). Furthermore, some targeting peptides are bipartite, directing proteins first to an organelle and 
then to a membrane within the organelle (e.g. within the thylakoid lumen of the chloroplast; see Keegstra and Cline 
(1999) The Plant Cell 1 1 : 557-570). In addition to the diversity in sequence and secondary structure, placement of the 
signal peptide is also varied. Proteins destined for the vacuole, for example, have targeting signal peptides found at 

55 (he N-terminus, at the C-terminus and at a surface location in mature, folded proteins. Signal peptides also serve as 
Itgands for some receptors. 

[2298] These characteristics of signal proteins can be used to more tightly control the phenotypic expression of 
introduced SDFs. In particular, associating the appropriate signal sequence with a specific SDF can allow sequestering 
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III.B.1. insertion of an Exogenous Gene 

exp^LelTha^suS eXPreSSi0n enCOdi " 9 30 exo 9 enous 9*™ can boost he number of gene copies 

J!"* 6Xpression «*««•«*• can comprise genes that either encode the native protein that is of interest or 

t2^£Z£ZE£ sequences ,rom REF MD SEQ TA8LES 1 *«> 2 - *E* -d 

[2282] Such an exogenous gene can include either a constitutive promoter permitting expression in any cefl in a host 

III. B.2. Regulatory Sequence Modulation 

[2283] The SDFs of REF and SEQ TABLES 1 AND 2. and fragments thereof, contain regulatory sequences that can 

be used to enhance express^ 

elements. " cases, duphcation c< er^^ 

express^ of a dewed gene from a particular promote, As other examples, all n promoters r^e bTXTa 

toexpose a polymerase b,nd,ngsrte. In either case, over-production of such proteins can be used to enhance expV^s- 
s.on of a gene of interest by increasing the activation time of the promoter. P 
[2284] Such regulatory proteins are encoded by some of the sequences in REF AND SEQ TABLES 1 AND 2 Irao 
ments thereof, and substantially similar sequences thereto. ' * 

[2285] Coding sequences for these proteins can be constructed as described above. 

IV. Gene Constructs and Vector Construction 

f , n ^ technKiues. recombinant DNA vectors which comprise said SDFs and are suitable for trans- 

2TS£i?- - T ^ Pfepared The SDF can be made using standa^S- 

binant DNA techmques (Sambrook et al. 1989) and can be introduced to the species of interest byAarotJi^L 

^l^TZl* ^ ° f * Particle gun Lbardment) as Z£252T 

SS YAcf and pa!^ 0 ^ 19 T * **** 3,1 SUCh 38 P bsn,ids ' via,sea - «««« chromosome^. 

B ACS, YACs and PACs and vectors of the sort described by . 

iS^izsSo'gger ^ M ** USA " : 8794 " 8797 (,992): Hammon et al - Proc - Na «- Acad. Sci. 

(b) YAC: Burke et at, Science 236:806-812 (1987);. 

(c) PAC: Sternberg N. et at. Proc Natl Acad Sci U S A. Jan;87(1):103-7 (1990)- 

(d) Bacteria-Yeast Shuttle Vectors: Bradshaw et al.. Nucl Ackte Res 23 4850-4856 (1 995) 

(^mbda Phage Vectors: Replacement Vector. e.g..FrischaufetaL.J.MolBiol170: 827-842(1983) orlnsertion 

IS ™ f £T f 6,Wer NM ^ ° NA Cl0nin9: A Practfcal ™ Oxford IRL P^s (1 9^7 

(0 T-DNA gene fusion vectors: Wakten et al.. Mol CeD Biol 1: 175-1 94 (1990)- and 
(g) Plasmld vectors: Sambrook et al.. infra 

f™f? a j VeC, ° r **■ the "»9enou8 gene, which in its turn comprises an SDF of the present 

construct ch.meraplast. or a codmg sequence with any desired transcriptional and/or translational regulatory sequent 

2Z? ^ ' nd * ^ ,emiin3,k5n sequences - Vectora - *• can abo inS oSf 

SET? ™« attaChmem re9fonS (SARS) " markere ' homotogous sequences, introns. etc. 9 
nSS ,2. , "T^ 8 ^u- 9 ' W th9 d6Sifed P 0 *™**- fof e «mPte a cDNA sequence encoding a full Lng-h 
d , " T K COmb ' ned " ah transcri P Uonal «d translational initiation regulatory sequenced whfch S 

[2290] For example, for over-expression, a plant promoter fragment may be employed that will direct transection 
of the gene « all fssues of a regenerated plant. Attematively. the plan. p«xnoler Jmct t«n«^SaSS 

L a p™" 6 (,iSSUeSP6CifiC PrOT0,erS) ° f ^ - *— ^ - ™me^ 
[2291] If proper polypeptide productions desired, a potyadenyla.ion region at the 3--end of the coding region is typ- 
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One of ordinary skill in the art. having this data can obtain cloned DN A fragments, synthetic DNA fragments or polypep- 
tides constituting desired sequences by recombinant methodology known in the art or described herein. 

EXAMPLES 

[2309] The invention is illustrated by way of the following examples. The invention is not limited by these examples 
as the scope of the invention is defined solely by the claims following. 

EXAMPLE 1: cDNA PREPARATION 

[231 0] A number of the nucleotide sequences disclosed in REF AND SEQ TABLES 1 AND 2 herein as representative 
of the SDFs of the invention can be obtained by sequencing genomic DNA (gDNA) and/or cDNA from com plants 
grown from HYBRID SEED # 35A19, purchased from Pioneer Hi-Bred International. Inc., Supply Management, P.O. 
Box 256, Johnston, Iowa 501 31 -0256. 

[231 1] A number of the nucleotide sequences disclosed in REF AND SEQ TABLES 1 AND 2 herein as representative 
of the SDFs of the invention can also be obtained by sequencing genomic DNA from Arabidopsis thaliana, Wassilews- 
kija ecotype or by sequencing cDNA obtained from mRNA from such plants as described below. This is a true breeding 
strain. Seeds of the plant are available from the Arabidopsis Biological Resource Center at the Ohio State University, 
under the accession number CS2360. Seeds of this plant were deposited under the terms and conditions of the Bu- 
dapest Treaty at the American Type Culture Collection, Manassas, VA on August 31 . 1 999, and were assigned ATCC 
No. PTA-595. 

[2312] Other methods for cloning full-length cDN A are described, for example, by Seki et aL, Plant Journal 5:707-720 
(1 998) High-efficiency cloning of Arabidopsis full-length cDNA by biotinylated Cap trapper"; Maruyama et al., Gene 
138:171 (1 994) Oligo-capping a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucle- 
otides"; and WO 96/34981 . 

[231 3] Tissues were, or each organ was, individually pulverized and frozen in liquid nitrogen. Next, the samples were 
homogenized in the presence of detergents and then centrif uged. The debris and nuclei were removed from the sample 
and more detergents were added to the sample. The sample was centrif uged and the debris was removed. Then the 
sample was applied to a 2M sucrose cushion to isolate polysomes. The RN A was isolated by treatment with detergents 
and proteinase K followed by ethanol precipitation and centrHugation. The potysoma! RNA from the different tissues 
was pooled according to the following mass ratios: 15/15/1 for male inflorescences, female inflorescences and root, 
respectively. The pooled material was then used for cDNA synthesis by the methods described below. 
[2314] Starting material for cDNA synthesis for the exemplary com cDNA clones with sequences presented in REF 
AND SEQ TABLES 1 AND 2 was poly(A)-containing polysomal mRNAs from inflorescences and root tissues of com 
plants grown from HYBRID SEED # 35A19. Male inflorescences and female (pre-and post-fertilization) inflorescences 
were isolated at various stages of development Selection for poIy(A) containing polysomal RNA was done using oligo 
d(T) cellulose columns, as described by Cox and Goldberg, Plant Molecular Biology: A Practical Approach*, pp. 1 -35, 
Shaw ed., c. 1988 by IRL, Oxford. The quality and the integrity of the potyA-t- RNAs were evaluated. 
[2315] Starting material for cDNA synthesis for the exemplary Arabidopsis cDNA clones with sequences presented 
in FIEF AND SEQ TABLES 1 AND 2 was polysomal RNA isolated from the top-most inflorescence tissues of Arabidopsis 
thaliana Wassilewskija (Ws.) and from roots of Arabidopsis thaliana Landsberg erecta (L. er.), also obtained from the 
Arabidopsis Biological Resource Center. Nine parts inflorescence to every part root was used, as measured by wet 
mass. Tissue was pulverized and exposed to liquid nitrogen. Next, the sample was homogenized in the presence of 
detergents and then centrif uged. The debris and nuclei were removed from the sample and more detergents were 
added to the sample. The sample was centrifuged and the debris was removed and the sample was applied to a 2M 
sucrose cushion to isolate polysomal RNA. Cox et al., Plant Molecular Biology: A Practical Approach", pp. 1-35, Shaw 
ed., c. 1988 by IRU Oxford. The polysomal RNA was used for cDNA synthesis by the methods described below 
Polysomal mRNA was then isolated as described above for com cDNA. The quality of the RNA was assessed elec- 
trophoretically. 

[2316] Following preparation of the mRNAs from various tissues as described above, selection of mRNA with intact 
5' ends and specific attachment of an oligonucleotide tag to the 5* end of such mRNA was performed using either a 
chemical or enzymatic approach. Both techniques take advantage of the presence of the cap* structure, which char- 
acterizes the 5' end of most intact mRNAs and which comprises a guanosine generally methylated once, at the 7 
position. 

[2317] The chemical modification approach involves the optional elimination of the 2*, 3*<is diol of the 3* terminal 
ribose, the oxidation of the 2\ S'-cis diol of the ribose linked to the cap of the 5' ends of the mRNAs into a dialdehyde, 
and the coupling of the such obtained dialdehyde to a derivatized oligonucleotide tag. Further detail regarding the 
chemical approaches for obtaining mRNAs having intact 5' ends are disclosed in International Application No. 
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of the protem in specific organelles (plastids, as an example), secretion outside of the cell, targeting interaction with 
pabular receptors, etc. Hence, the inclusion of signal proteins in constructs involving the SDFs otC^eT^Z 

SSTS^ 9Cne ^ US, "9 COmm ° n mo,ecular bi <*>9 ical techni ^ or can be seized in vitrT 
S?Li thanat '^ J s, 9 nal P e P ,lde sequences, both amino acid and nucleotide, described in the REF and 

SEQ tablescan be used to modulate polypeptide transport. Further variants of the native signal peptides descr^n 
1 , , SE T! ,nsertions - dele ^. or substitutions can be'made SuS JSSSS 

to toe sZ™° "" ° a,iVe ^ PSP,id8 38 33 eXhtoUin9 ^ d69fee °" ^nceTdJnS 

XesTmS B - iWenfon ~ USefU ' ^ - be *- «* — — Peptides 

V. Transformation Techniques 

[2301] A wide range of techniques for inserting exogenous polynucleotides are known for a number of host cells 
including, without limrtation, bacterial, yeast, mammalian, insect and plant cells ' 
S 2 L TeCh " ,0f ( trans ' ormin 9 a variet / <* W9her plant species are well known and described in the tech- 

IISSSSK ^ e ' e ' 9 ' Weisin9 et al " * na 66,191 - 421 0 988,1 and Christou - «S 

[2303] DN A constructs of the invention may be introduced into the genome of the desired plant host by a variety of 
conv^entiona. techniques. For example, me DNA construct may be introduced directly into L genomi -DNA XI 
£ , !!? 9 ^ n,q T SUCh 38 etectr °P° rati0 " «"d microinjection of plant cel. protests, or the DNA consUucts 
^be.n.oducedd^to plant tissue using ballist* methods, such as DNA parties bombardment. SUSS 
the DNA constructs may be combined with suitable TDNA flanking regk™ and introduced into a corwentk^S 
bactenumtumefaoens host vector. The virulence functions of the Agrobacterium tumefaciens host will cfirecTtrX 
sertK*,^^ 

^etSl^^^ 

S^SSrS? 0 " ,e f hn ! quOS are known h 4,13 art 30(1 we « ascribed in the scientific and patent literature. The 
COnS,nJCtS US ' ng P 01 ***™ Si/col precipitation is described in Paszkowski et al. EMBO J 3 
p ., r . ' f ,ectro P° ratlon techniques are described in Fromm et al. Proc. Natl Acad. Sci. USA 82:5824 (1885)' 
r ^ 1eChni<,UeS afe deSCfibed 1,1 Mein et aL Atato»az:773 (1987). ^rote^VmeS 
Tnt fSLSj™! I rT 65, inC ' Udin9 diSarmin9 and US6 °' bina ^ ° r ^e weiS^S 

7^7? f IT '° r eXamP ' e Hamill0n ' CM ' Ge ^200^07 (1997); Muller et al. Mol. Gen. Genet 207" 

oJ ( . « 8 P; K ° man etal - JS»85 (1996); Venkateswarlu et al. Biotechnology 9:1103 (1991) and GleaTe^P 

SLXf " - 1203 0992): GfaV6S ° 0tt ™ n ' * "* e/b ' ^ < 1986 ^" d Gouid efauS^^ 
[2305] Transformed plant cells which are derived by any of the above transformation techniques can be cultured to 

ness. Such regenerate techniques refy on manipulation of certain phytohormor.es in a tfesue cutture growth medium 
typed* rely^g on a bkxide and/or herbicide marker which has been introduced together with the aXJnSSl 
Se *T^ Regeneration from cultured protoptests is described in Evans et S Protop^so^Z aZcS^ 
n Handbook of Plant Cell Cutture,- pp. 124176. MacMfllan Publishing Company. New Vol 1983 aS B^nc r7 
geneotonotPtonts. Plant Protoplasts, pp. 2173. CRC Press. Boca Raton. ^S^^^S^l- 
from plant ca.lusexp.ants. organs, or parts thereof. Such regeneration techniques are describeTge^S.hTin S 
al. Ann. R^ot Plant Phys. 38.467 (1987). Regeneration of monocots (rice) is described by HomJ^ * Xef 

u£E5T P^>^ «*<*-"--■ < J BOtechnol.^ (1994)) . TheLleic^oft^ 
lion can be used to confer desired traits on essentially any plant. 

222. H! US ' ^ ^? n,i0n T USS ° Vef 3 bf0ad ran9e °' plan,s " inc,udin 9 s P«ies from the genera Anacardium 
f Avena, Brassica. Citrus, Citruttus. Capsicum. Cartnamus. Cocos, dffea. CucuZTcu- 
SETST? £fe9 ' S ' Fra9ara ' G ^ ^YPium. Helianthus. HetevcaUis. Hordeum. HyosTyamus, ZctZ 
Unum Lol,um,Lup,nus. LycopersKon, Malus, Manihot, Majorana, Medicago. Nicotians, Olea, OrZa Panieun "pTn- 

num. Sorghum. Theobromus. Tngonella, Triticum, Vida, Wis, Vigna, and Zea 

[2307] One of skill will recognize that after the expression cassette is stably incorporated in transgenic plants and 

2S ° Perab,e : " 080 bC in,r0dUCed im ° ° ,her P ,a " ,s b V —I crosshTAny of a STofSSS 
breeding techniques can be used, depending upon the species to be crossed siarraara 
[2308] The particular sequences of SDFs identified are provided in the attached REF AND SEQ TABLES 1 AND 2. 
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ttons result in detection of hybridization between sequences having at least 70% sequence identity. As described above, 
the hybridization and wash conditions can be changed to reflect the desired percentage of sequence identity between 
probe and target sequences that can be detected. 

[2329] In the following procedure, a probe for hybridization is produced from two PCR reactions using two primers 
from genomic sequence of Arabidopsb thaliana. As described above, the particular template for generating the probe 
can be any desired template. 

[2330] The first PCR product is assessed to validate the size of the primer to assure it is of the expected size. Then 
the product of the first PCR is used as a template, with the same pair of primers used in the first PCR. in a second 
PCR that produces a labeled product used as the probe. 

[2331] Fragments detected by hybridization, or other bands of interest, can be isolated from gels used to separate 
genomic DNA fragments by known methods for further purification and/or characterization. 

Buffers for nuclear DNA extraction 

[2332] 



1. 10XHB 





1000 ml 




40 mM spermidine 


10.2 g 


Spermine (Sigma S-2876) and spermidine (Sigma S-2501) 


10 mM spermine 


3.5 g 


Stabilize chromatin and the nuclear membrane 


0.1 M EDTA (disodium) 


37.2 g 


EDTA inhibits nuclease 


0.1 MTris 


12.1 g 


Buffer 


0.8 M KCI 


59.6 g 


Adjusts ionic strength for stability of nuclei 


Adjust pH to 9.5 with 10 N NaOH. It appears that there is a nuclease present in leaves. Use of pH 9.5 appears 
to inactivate this nuclease. 



2.2M sucrose (684 g per 1000 ml) 

Heat about half the final volume of water to about 50*C. Add the sucrose slowly then bring the mixture to close 
to final volume; stir constantly until it has dissolved. Bring the solution to volume. 



3. Sarkosyl solution (lyses nuclear membranes) 





1000 ml 


N-lauroyl sarcosine (Sarkosyl) 
0.1 M Tris 

0.04 M EDTA (Disodium) 


20.0 g 

12.1 g 
14.9 g 


Adjust the pH to 9.5 after all the components are dissolved and bring up to the proper volume. 



4. 20% Triton X-100 

80 ml Triton X-100 

320 ml 1xHB (w/o £-ME and PMSF) 

Prepare in advance; Triton takes some time to dissolve 

A. Procedure 

[2333] 

1 . Prepare 1 X H" buffer (keep ice-cold during use) 





1000 ml 


10X HB 


100 ml 
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[231 9] In both the chemical and the enzymatic approach, the oliqonucleotide tao h*<: * r*^t ^ . , 

ss&sxr mRNA is exarrtned * performin9 a Northem ^ usin9 a p^sess !o £ 

£S? JS? 6 mR 1 AS ^ 10 ° ,i 9 onucteotide te 9s "sing either the chemical or the enzymatic method first strand 

S- rcf'T" 9 S f CO T d S,rand SyntheSfe ' 1116 ,u,| - ,en 9 th cDNAs are ^ed into a phagemid vector such as oBlua- 
Scnpt™ (Stratagene). The ends of the fun-length cDNAs are blunted with T4 DNA ooivmiLZ Z , u f . ? P 
is digested with EcoRI. Since methytated dCT?» is used during i f ^ ^ CDNA 
on.yhemi-methy.ated site; hence theonty site susceptible te^S^l^Z^^^^ 1 ^ 
ing, an Hind III adapter is added to the /end of fuoingt^^SJ '° faC * ato SUbCton ' 

K SToTas^nMAfn 0,i90nUC ' eo,ide te 9 ateche d to fulWenglh cONAs are selected as follows 
EXAMPLE 2: SOUTHERN HYBRIDIZATIONS 

[2328] The procedures described herein can be used to isolate related nnh/ni iH<w;h^ , ^ 

"** «*-»' *— « — » - «*- ^sssssce^sses 
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12. Centrifuge at 184,000 x g for 16 to 20 hours in a fixed-angle rotor. 



13. Remove the dark red supernatant that is at the top of the tube with a plastic transfer pipette and discard. 
Carefully remove the DNA band with another transfer pipette. The DNA band is usually visible in room light; oth- 

5 erwise, use a long wave UV light to locate the band. 

14. Extract the ethidium bromide with isopropanol saturated with water and salt. Once the solution is clear, extract 
at least two more times to ensure that all of the EtBr is gone. Be very gentle, as it is very easy to shear the DNA 
at this step. This extraction may take a while because the DNA solution tends to be very viscous. If the solution is 

w too viscous, dilute it with TE. 

15. Dialyze the DNA for at least two days against several changes (at least three times) of TE (10 mM Tris, ImM 
EDTA, pH 8) to remove the cesium chloride. 

is 16. Remove the diaryzed DNA from the tubing. If the dialyzed DNA solution contains a lot of debris, centrifuge the 

DNA solution at least at 2500 x g for 10 min. and carefully transfer the clear supernatant to a new tube. Read the 
A260 concentration of the DNA 



17. Assess the quality of the DNA by agarose gel electrophoresis (1% agarose gel) of the DNA. Load 50 ng and 
20 100 ng (based on the OD reading) and compare it with known and good quality DNA. Undigested lambda DNA 

and a lambda-Hindi 1 1 -digested DNA are good molecular weight makers. 



Protocol for Digestion of Genomic DNA 

25 Protocol: 
[2334] 

1. The relative amounts of DNA for different crop plants that provide approximately a balanced number of genome 
30 equivalent is given in Table 3. Note that due to the size of the wheat genome, wheat DNA will be underrepresented. 

Lambda DNA provides a useful control for complete digestion. 



2. Precipitate the DNA by adding 3 volumes of 100% ethanol. Incubate at -20°C for at least two hours. Yeast DNA 
can be purchased and made up at the necessary concentration, therefore no precipitation is necessary for yeast 

35 DNA. 



3. Centrifuge the solution at 11 ,400 x g for 20 min. Decant the ethanol carefully (be careful not to disturb the pellet). 
Be sure that the residual ethanol is completely removed either by vacuum desiccation or by carefully wiping the 
sides of the tubes with a clean tissue. 

4. Resuspend the pellet in an appropriate volume of water. Be sure the pellet is fully resuspended before proceeding 
to the next step. This may take about 30 min. 

5. Add the appropriate volume of 10X reaction buffer provided by the manufacturer of the restriction enzyme to 
the resuspended DNA followed by the appropriate volume of enzymes. Be sure to mix it properly by slowly swirling 
the tubes. 



6. Set-up the lambda digestion-control for each DNA that you are digesting. 

7. Incubate both the experimental and lambda digests overnight at 37°C. Spin down condensation in a microf uge 
before proceeding. 

8. After digestion, add 2 pi of loading dye (typically 0.25% bromophenol blue, 0.25% xylene cyanol in 15% Ficoll 
or 30% glycerol) to the lambda-control digests and toad in 1% TPE-agarose gel (TPE is 90 mM Tris-phosphate, 2 
mM EDTA, pH 8). If the lambda DNA in the lambda control digests are completely digested, proceed with the 
precipitation of the genomic DNA in the digests. 



9. Precipitate the digested DNA by adding 3 volumes of 100% ethanol and incubating in -20°C for at least 2 hours 
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(continued) 





1000 ml 


2 M sucrose 
Water 


250 ml a non-ionic osmoticum 
634 ml 


Added just before use: 




100 mM PMSF* 
P-mercaptoethanol 

100 mM PMSF 


10 ml a protease .nhibitor; protects nuclear membrane proteins 
1 ml inactivates nuclease by reducing disulfide bonds 



(phenyl methyl sulfonyl fluoride. Sigma P-7626) 
(add 0.0875 g to 5 ml 100% ethanoQ 



2. Homogenize the tissue in a blender (use 300-400 ml of 1 *hr ~ r ki««^ , n 

buffer per gram of tissue Blenders ZnTZ^T T„ UHB ?° r blonder ). Be sure that you use 5-10 ml of HB 
the blended h ice "" ^ *" h ° m °9 enat8 «* It is necessary to put 

t^LZ tjz^^t^tj:t mefs into r ^ beaker - ^ firet * a 

and the fourth is through ^ 

by gentry -^^^S^'E^ 3 ^ ,0 "* *° *~* ™ U > 

5. Centrifuge the filtrate at 1 200 x g for 20 min. at 4»C to pellet the nuclei. 

Point nuclei again M 1 2M . m> x 9 Discard th. iwpematant 

w^a^^ 

by weighing 1 ml of solution and add CsCI H nece&aa^y tobrinq^o 1^7 a^L The Vi!^ 6 density of the solution 
(sucrose etc, and the refract index afcne J^^X£££^ *** 

1 1 . Add 20 pal of 1 0 mg/ml EtBr per ml of solution. 
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B. Protocol for PCR Amplification of Genomic Fragments in Arabidopsis 
Amplification procedures: 
5 [2336] 



1 . Mix the following in a 0.20 ml PCR tube or 96-well PCR plate: 



Volume 


Stock, 


Final Amount or Cone. 


0.5 uJ 


-10 ng/uJ genomic DNA 1 


5ng 


2.5 uJ 


10X PCR buffer 


20 mM Tris, 50 mM KCI 


0.75 uJ 


50 mM MgC^ 


1.5 mM 


1uJ 


10 pmot/^il Primer 1 (Forward) 


10pmol 


1uJ 


10 pmol^if Primer 2 (Reverse) 


10pmol 


0.5 uJ 


5 mM dNTPs 


0.1 mM 


0.1 uJ 


5 units/uJ Platinum Taq - (Life Technologies, Gaithersburg, MD) DNA 
Polymerase 


1 units 


(to 25 ul) 


Water 





2. The template DNA is amplified using a Perkin Elmer 9700 PCR machine: 

25 



1 ) 94°C for 1 0 min. followed by 


2) 


3) 




5 cycles: 


5 cycles: 


25 cycles: 


94 'C - 30 sec 
62 °C - 30 sec 
72 °C - 3 min 


94 °C - 30 sec 
58 °C - 30 sec 
72 °C - 3 min 


94 °C - 30 sec 
53 °C - 30 sec 
72 °C - 3 min 


5) 72°C for 7 min. Then the reactions are stopped by chilling to 4°C. 



[2337] The procedure can be adapted to a multi-well format if necessary. 
Quantification and Dilution of PCR Products: 

40 

[2338] 

1. The product of the PCR is analyzed by electrophoresis in a 1% agarose gel. A linearized plasmid DNA can be 
used as a quantification standard (usually at 50, 100, 200, and 400 ng). These will be used as references to 
^ approximate the amount of PCR products. Hind! II -digested Lambda DNA is useful as a molecular weight marker. 

The gel can be run fairly quickly; e.g., at 100 volts. The standard gel is examined to determine that the size of the 
PCR products is consistent with the expected size and if there are significant extra bands or smeary products in 
the PCR reactions. 

^ 2. The amounts of PCR products can be estimated on the basis of the plasmid standard. 

3. For the small number of reactions that produce extraneous bands, a small amount of DNA from bands with the 
correct size can be isolated by dipping a sterile 10-pl tip into the band while viewing though a UV Transillumtnator. 
The small amount of agarose gel (with the DNA fragment) is used in the labeling reaction. 
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(preferably overnight). 

EXCEPTION: Arabidopsis and yeast DN A are digested in an appropriate volume; they donl have to be precipitated. 

10. Resuspend the DNA in an appropriate volume of TE (e.g.. 22 ul x 50 blots = 1 1 00 ul) and an appropriate volume 
of OX toadung dye (e.g.. 2.4 ul x 50 blots = 1 20 ul). Be caret u. in pipetting the .oading dye - it is viscous. Be sure 
you are pipetting the correct volume. 



Table 3 



Some guide p 


oints in digesting 


genomic DNA. 


Species 


Genome Size 


Size Relative to 
Arabidopsis 


Genome Equivalent to 
2 jig Arabidopsis DNA 


Amount of DNA per 
blot 


Arabidopsis 


120 Mb 


1X 


1X 


2|ig 


Brassica 


1,100 Mb 


9.2X 


0.54X 


10 ug 


Com 


2,800 Mb 


23.3X 


0.43X 


20 ug 


Cotton 


2,300 Mb 


19.2X 


0.52X 


20 ug 


Oat 


11,300 Mb 


94X 


0.11X 


20 ug 


Rice 


400 Mb 


3.3X 


0.75X 


Sag 


Soybean 


1,100 Mb 


9.2X 


0.54X 


10 uo. 


Sugarbeet 


758 Mb 


6.3X 


0.8X 


10 ^ 


Sweetclover 


1,100 Mb 


9.2X 


0.54X 


10 jig 


Wheat 


16,000 Mb 


133X 


0.08X 


20ug 


Yeast 


15 Mb 


0.12X 


1X 


0.25 ftg 



10 



15 



20 



25 



30 



40 



45 



SO 



Protocol for Southern Blot Analysis 

™ a t J"* dl9e "? d ^Jt amp ' eS ^ eUsctK * ho "*<* i »™ 1 % agarose gels in tx TPE buffer. Low voftage; overnight 
separations are preferred. The gels are stained with EtBr and photographed. wemigra 

1 . For blotting the gels, first incubate the gel in 0.25 N HCI (with gentle shaking) for about 15 min. 

I I2m nS to7 5 m£ Wa,er ^ ^ d8natUred * 2 inCUba,i ° nS lnCUba,e < with shaki "9> « 0.5 M NaOH 

15^ e NaC| S fS 15 "* neUtrafeed by fr™**** (with shaking) in 1.5 M Tris pH 7.5 in 

4. A nylon membrane is prepared by soaking it in water for at least 5 min. then in 6X SSC (or at least 15 min 
before use (20x SSC is 175.3 g NaCI. 88.2 g sodium citrate per Ste, adjusted to pH 7.0.) 

fro^.h^.Tr 15 ' 3 ' 1 ^ Pl3Ced 0n, ° PO,,he9eland a " bubbles in ^toeen a« removed. The DNA is blotted 
rom the gel to the membrane using an absorbent medium, such as paper toweling and 6x SCC buffer After the 
transfer, the membrane may be lightfy brushed with a gtoved hand to remove any agarose s^ngTLT^. 

L^u™ A ^ *** f ' Xed l ° membran0 by UV ^'inking and baking at WC. The membrane is stored at 4«C 
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1)94°C for 10min. 


2) 


3) 


4) 


5 cycles: 


5 cycles: 


25 cycles: 


95° C - 30 sec 
61°C-1 min 
73°C - 5 min 


95°C - 30 sec 
59°C - 1 min 
75°C - 5 min 


95°C - 30 sec 
51°C- 1 min 
73°C - 5 min 


5) 72°C for 8 min. The reactions are terminated by chilling to 4°C (hold). 



3. The products are analyzed by electrophoresis- in a 1% agarose gel, comparing to an aliquot of the unlabeiled 
probe starting material. 

4. The amount of DIG-labeled probe is determined as follows: 

Make serial dilutions of the diluted control DNA in dilution buffer (TE: 10 mM Tris and 1 mM EDTA, pH 8) as 
shown in the following table: 



DIG-labeled control DNA starting cone. 


Stepwise Dilution 


Final Cone. (Dilution Name) 


5 ng/uJ 


1 uJ in 49 uJ TE 


100 pg/uJ (A) 


100po/uJ (A) 


25 uJ in 25 uJ TE 


50 pg/uJ (B) 


50 pg/uJ(B) 


25 uJ in 25 uJ TE 


25pgyuJ(C) 


25pg/u1 (C) 


20uJin30uJTE 


10pg/uJ(D) 



a. Serial deletions of a DIG-labeled standard DNA ranging from 100 pg to 10 pg are spotted onto a positively 
charged nylon membrane, marking the membrane lightly with a pencil to identify each dilution. 

b. Serial dilutions (e.g., 1:50. 1:2500, 1:10,000) of the newly labeled DNA probe are spotted. 

c. The membrane is fixed by UV crosslinking. 

d. The membrane is wetted with a small amount of maleate buffer and then incubated in 1% blocking solution 
for 15 min at room temp. 

e. The labeled DNA is then detected using alkaline phosphatase conjugated anti-DIG antibody (Boehringer 
Mannheim, Indianapolis, IN, cat no. 1093274) and an NBT substrate according to the manufacture's instruc- 
tion. 

f. Spot intensities of the control and experimental dilutions are then compared to estimate the concentration 
of the PCR-DIG-labeled probe. 

D. Prehybridization and Hybridization of Southern Blots 

Solutions: 
[2341] 



100% Formamide 


purchased from Gibco 


20X SSC 


(1X = 0.15 M NaCI, 0.015 M Narrate) 


per L 


175gNaCI 




87.5 g Na 3 citrate 2H£ 
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C. Protocol for PCR-DIG-Labeling of DNA 

Solutions: 
[2339] 

10X dNTP ♦ DIG-1 1 -dUTP [1:5]: (2 mM dATP, 2 mM dCTP, 2 mM dGTR 1.65 mM dTTP, 0.35 mM DIG-11-dUTP) 

10X dNTP + DIG-11-dUTP [1:10]: (2 mM dATP. 2 mM dCTR 2 mM dGTP, 1.81 rnMoTTP, 0.19 mM DIG- 11 -dUTP) 

10XdNTP + DIG-11^UTP^ 

TE buffer (10 mM Tris, 1 mM EDTA, pH 8) 

Maleate buffer In 700 ml of deionized distilled water dissolve 11 .61 g maleic acid and 8.77 g NaCI Add NaOH to 
adjust the pH to 7.5. Bring the volume to 1 L Stir for 15 min. and sterilize. 

tX^ diStn,ed 6]SS ° We 1169 maWB acid Ne * add N ^OH to adjust 

L!?7 ^£^ 10 ^ Cat no 1096176) 

Heat to 60*C wh.le storing to dissolve the powder. Adjust the volume to 100 ml withvSer. StirSd sLS 

1% blocking solution: Dilute the 10% stock to 1% using the maleate buffer. 

9?5M ^r M IT^ M ^ 50 mM ***** PH95) - from au,octev ^ solutions of 1M Tris pH 

9-5, 5 M NaCI, and 1 M MgCfe tn autoclaved distilled water. 

Procedure: 
[2340] 

1. PCR reactions are performed in 25 uJ volumes containing: 



PCR buffer 
MgCfe 

10XdNTP + DIG-11-dUTP 
Platinum Taq™ Polymerase 
10 pg probe DNA 
10 pmol primer 1 



1X 

1.5 mM 

1X (please see the note below) 
1 unit 



Note: 





Use for 


10X dNTP + DIG-11-dUTP (1:5) 


<1 kb 


10X dNTP + DIG-11<iUTP (1:10) 
10X dNTP + DIG-1 1-dUTP (1:15) 


1 kbto1.8kb 
>1.8kb 



2. The PCR reaction uses the following amplification cycles: 



EP 1 033 405 A2 



Washing buffer Maleic acid buffer with 0.3% (v/v) Tween 20. 

Blocking stock solution 10% blocking reagent in buffer 1 . Dissolve (10X concentration): blocking reagent pow- 
der (Boehringer Mannheim, Indianapolis, IN, cat. no. 1096176) by constantly stirring 

s on a 65°C heating block or heat in a microwave, autoclave and store at 4°C. 

Buffer 2 

(1X blocking solution): Dilute the stock solution 1:10 in Buffer 1. 

10 Detection buffer 0.1 M Tris, 0.1 M NaCI, pH 9.5 
Procedure: 



is 



20 



25 



30 



45 



SO 



12344] 



1 1 . Expose for 2 hours at room temperature to X-ray film. Multiple exposures can be taken. Luminescence continues 
for at least 24 hours and signal intensity increases during the first hours. 

Example 3: Transformation of Carrot Cells 

[2345] Transformation of plant cells can be accomplished by a number of methods, as described above. Similarly, 
a number of plant genera can be regenerated from tissue culture following transformation. Transformation and regen- 
eration of carrot cells as described herein is illustrative. 

[2346] Single cell suspension cultures of carrot (Daucus carota) cells are established from hypocotyls of cultivar 
Early Nantes in Bg growth medium (O.L Gamborg et al.. Plant Physiol 45:372 (1970)) plus 2.4-D and 15 mM CaC^ 
(Bs -44 medium) by methods known in the art. The suspension cultures are subcultured by adding 10 ml of the sus- 
pension culture to 40 mi of Bg-44 medium in 250 ml flasks every 7 days and are maintained in a shaker at 150 rpm at 
27 °C in the dark. 



1 . After the post-hybridization wash the blots are briefly rinsed (1 -5 min.) in the maleate washing buffer with gentle 
shaking. 

PS-- 

2. Then the membranes are incubated for 30 min. in Buffer 2 with gentle shaking. \/Z 

P 

3. Anti-DIG-AP conjugate (Boehringer Mannheim, Indianapolis, IN, cat. no. 1093274) at 75 mlMnl (1:10,000) in r-"' 
Buffer 2 is used for detection. 75 ml of solution can be used for 3 blots. 

4. The membrane is incubated for 30 min. in the antibody solution with gentle shaking. j 

5. The membrane are washed twice in washing buffer with gentle shaking. About 250 mis is used per wash for 3 I 

blots. j 

I 
i 

6. The blots are equilibrated for 2-5 min in 60 ml detection buffer \ 

7. Dilute CSPD (1:200) in detection buffer. (This can be prepared ahead of time and stored in the dark at 4°C). 
The following steps must be done individually. Bags (one for detection and one for exposure) are generally cut 
and ready before doing the following steps. 

35 8. The blot is carefully removed from the detection buffer and excess liquid removed without drying the membrane. 

The blot is immediately placed in a bag and 1 .5 ml of CSPD solution is added. The CSPD solution can be spread 
over the membrane. Bubbles present at the edge and on the surface of the blot are typically removed by gentle 
rubbing. The membrane is incubated for 5 min. in CSPD solution. . ; 

40 9. Excess liquid is removed and the membrane is blotted briefly (DNA side up) on Whatman 3MM paper. Do not W:' 

let the membrane dry completely. ^ ' . 

1 0. Seal the damp membrane in a hybridization bag and incubate for 1 0 min at 37° C to enhance the luminescent 

reaction. i 
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10% Blocking Reagent: In 80 ml debnized distilled water, dissolve 1.16 g maieic acid Next, add NaOH to adjust 
he pH to 7.5. Add 10 g of the blocking reagent powder. Heat to 60'C while stirring to dissolve the powder Ad ust 
the volume to 100 ml with water Stir and sterilize ^ J 
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Prehybridization Mix: 


Final Concentration 


Components 


Volume (per 100 ml) 


Stock 


50% 


Formamide 


50 ml 


100% 


5X 


SSC 


25 ml 


20X 


0.1% 


Sarkosyl 


0.5 ml 


20% 


0.02% 


SDS 


0.1 ml 


20% 


2% 


Blocking Reagent 


20 ml 


10% 




Water 


4.4 ml 





General Procedures: 
[2342] 

! oS?,^? 3 hea, ; Sea t ,e te 9 and add 80 appropriate volume of prehybridization solution (30 ml/ 

1 ^1 TT f *" ** 3 ^ Sealer ' 3VOidin9 bubb,es as ™<* « P^tole. Lay down 
he bags m a large plast,c tray (one tray can accommodate at least 4-5 bags). Ensure that the tegs are lyinq Z 

Li p h " *° P"***^ «**n is evenly distributed throughout me bag. IncubSttfe b.S for m 

least 2 hours with gentle agitation using a waver shaker. me dioi iot at 

2LDe, rWJ.G-.abe.ed DNA probe by incubating for 10 min. at 98'C using the PCR machine and immediately 

iz s pr r y asss^ 0 ^ (2s n{ ™ 30 mi = 750 ng «* probe) - mbc - b « — 

5. Incubate with gentle agitation for at least 16 hours. 

6. Proceed to medium stringency post-hybridization wash: 

Three times for 20 min. each with gentle agitation using 1X SSC. 1% SDS at 60°C. 

All wash solutions must be prewarmed to 60*0. Use about 100 ml of wash solution per membrane. 

Il!^ id baCkgrOU r d keep me me «*™es fully submerged to avoid drying in spots; agitate sufficiently to 
avoid having membranes stick to one another. wwiweraiy 10 

7. After the wash, proceed to immunological detection and CSPD development 
E. Procedure for Immunological Detection with CSPD 

Solutions: 
[2343] 

BUffer 1 : Maleic acid buffer (0-1 M maieic acid, 0.15 M NaCI; adjusted to pH 7.5 with NaoH) 
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7. The isolated nucleic acid molecule of any one of claims 1 -5. wherein said nucleic acid is capable of functioning as 
a promoter, a 3" end termination sequence, an untranslated region (UTR), or as a regulatory sequence. 

8. The isolated nucleic acid molecule of claim 7, wherein said nucleic acid is a promoter and comprises a sequence 
5 selected from the group consisting of a TATA box sequence, a CAAT box sequence, a motif of GCAATCG or any 

transcriptoin-factor binding sequence, and any combination thereof. 

9. The isolated nucleic acid molecule of claim 7, wherein the nucleic acid sequence is a regulatory sequence which 
is capable of promoting seed-specific expression, embryo-specific expression, ovule-specific expression, tapetum- 

70 specific expression or root-specific expression of a sequence or any combination thereof. 

10. A vector construct comprising a nucleic acid molecule according to any one of claims 1-9, wherein said nucleic 
acid molecule is heterologous to any element in said vector construct 

t5 1 1 . A vector construct according to claim 1 0 comprising: 

(a) a first nucleic acid having a regulatory sequence capable of causing transcription and/or translation; and 

(b) a second nucleic acid having the sequence of said isolated nucleic acid molecule according to any one of 
claims 1-4; 

20 

wherein said first and second nucleic acids are operabfy linked and wherein said second nucleic acid is heterolo- 
gous to any element in said vector construct. 

12. The vector construct according to claim 11 , wherein said first nucleic acid is native to said second nucleic acid. 

25 

13. The vector construct according to claim 11 , wherein said first nucleic acid is heterologous to said second nucleic 
acid. 

14. A vector construct according to claim 10 comprising: 

30 

(c) a first nucleic acid having having the sequence of said isolated nucleic acid molecule according to claim 
7; and 

(d) a second nucleic acid; 

35 wherein said first and second nucleic acids are operabty linked and wherein said first nucleic acid is heterologous 

to any element in said vector construct. 

15. The vector construct according to claim 14, wherein said first nucleic acid is native to said second nucleic acid. 

40 16. The vector construct according to claim 14, wherein said first nucleic acid is heterologous to said second nucleic 
acid. 

17. A host cell comprising an isolated nucleic acid molecule according to any one of claims 1-4, wherein said nucleic 
acid molecule is flanked by exogenous sequence. 

45 

18. A host cell comprising a vector construct of any one of claims 10-1 6. 

19. An isolated polypeptide comprising an amino acid sequence 

50 (a) exhibiting at least 40% sequence identity of an amino acid sequence encoded by a sequence shown in 

REF anoVor SEQ Table 1 or 2 or a fragment thereof; and 

(b) capable of exhibiting at least one of the biological activities of the polypeptide encoded by said nucleotide 
seqence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

55 20. The isolated polypeptide of claim 19, wherein said amino acid sequence exhibits at least 75% sequence identity 
to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

21. The isolated polypeptide of claim 19, wherein said amino acid sequence exhibits at least 85% sequence identity 
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[2347] The suspension culture cells are transformed with exogenous DNA as described by Z Chen et al. Plant Mol 
Bio. 36:163 (1998). Briefly, 4-days post-subculture cells are incubated with eel! wall digestion solution containing 0.4 
M sorbitol, 2% driselase, 5mM MES (2-[N-Morpho1ino] ethanesulfonic acid) pH 5.0 for 5 hours. The digested cells are 
pelleted gently at 60 xg for 5 min. and washed twice in W5 solution containing 1 54 mM NaCI, 5 mM KCI, 1 25 mM CaCfe 

s and 5mM glucose, pH 6.0. The protoplasts are suspended in MC solution containing 5 mM MES, 20 mM CaClg, 0.5 
M mannitol, pH 5.7 and the protoplast density is adjusted to about 4 x 1 0 6 protoplasts per ml. 
[2348] 15-60 fig of plasmid DNA is mixed with 0.9 ml of protoplasts. The resulting suspension is mixed with 40% 
polyethylene glycol (MW 8000, PEG 8000), by gentle inversion a few times at room temperature for 5 to 25 min. 
Protoplast culture medium known in the art is added into the PEG-DNA-protoplast mixture. Protoplasts are incubated 

10 in the culture medium for 24 hour to 5 days and cell extracts can be used for assay of transient expression of the 
introduced gene. Alternatively, transformed cells can be used to produce transgenic callus, which in turn can be used 
to produce transgenic plants, by methods known in the art. See, for example, Nomura and Komamine, Pit Phys. 79: 
988-991 (1 985), Identification and Isolation of Single Cells that Produce Somatic Embryos in Carrot Suspension Cul- 
tures. 

« [2349] The invention being thus descrtoed, it will be apparent to one of ordinary skill in the art that various modifica- 
tions of the materials and methods for practicing the invention can be made. Such modifications are to be considered 
within the scope of the invention as defined by the following claims. 

[2350] Each of the references from the patent and periodical literature cited herein is hereby expressly incorporated 
in its entirety by such citation. 
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1 . An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which encodes an amino 
acid sequence exhibiting at least 40% sequence identity to an amino acid sequence encoded by 

(a) a nucleotide sequence described in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 

(b) a complement of a nucleotide sequence shown in REF anoYor SEQ Table 1 or 2 or a fragment thereof. 

2. An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which exhibits at least 
65% sequence identity to 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 

(b) a complement of a nucleotide sequence shown in REF anoYor SEQ Table 1 or 2 or a fragment thereof. 

3. An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which exhibits at least 
65% sequence identity to a gene comprising 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

4. An isolated nucleic acid molecule which is the reverse of the isolated nucleotide sequence according to any one 
of claims 1 -3, such that the reverse nucleotide sequence has a sequence order wh ch is the reverse of the sequence 
order of said isolated nucleotide sequence according to any one of claims 1-3. 

5. An isolated nucleic acid molecule comprising a nucleic acid capable of hybridizing to a nucleic acid having a 
sequence selected from the group consisting of: 

(a) a nucleotide sequence which is shown in REF and/or SEQ Table 1 or 2; and 

(b) a nucleotide sequence which is complementary to a nucleotide sequence shown in REF and/or SEQ Table 
1 or 2; 

under conditions that permit formation of a nucleic acid duplex at a temperature from about 40°C and 48°C below 
the melting temperature of the nucleic acid duplex. 

6. The nucleic acid molecule according to any one of claims 1 -5, wherein said nucleic acid comprises an open readinq 
frame. 
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to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

22. The isolated polypeptide of claim 19. wherein said amino acid sequence exhibits at least 90% sequence identity 
to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

s 

23. An antibody capable of binding the isolated polypeptide of any one of claims 19-22. 

24. A method of introducing an isolated nucleic acid into a host cell comprising: 

io (a) providing an isolated nucleic acid molecule according to any one of claims 1-4; and 

(b) contacting said isolated nucleic with said host cell under conditions that permit insertion of said nucleic 
acid into said host cell. 

25. A method of transforming a host cell which comprises contacting a host cell with a vector construct according to 
is any one of claims 10-16. 

26. A method of modulating transcription and/or translation of a nucleic acid in a host cell comprising: 

(a) providing the host cell of claim 24 or 25; and 
20 (b) culturing said host cell under conditions that permit transcription or translation. 

27. A method for detecting a nucleic acid in a sample which comprises: 

(a) providing an isolated nucleic acid molecule according to any one of claims 1 -5; 
25 (b) contacting said isolated nucleic acid molecule with a sample under conditions which permit a comparison 

of the sequence of said isolated nucleic acid molecule with the sequence of DNA in said sample; and 

(c) analyzing the result of said comparison. 

28. The method according to claim 27, wherein said isolated nucleic acid molecule and said sample are contacted 
30 under conditions which permit the formation of a duplex between complementary nucleic acid sequences. 

29. A plant or cell of a plant which comprises a nucleic acid molecule according to any one of claims 1-4 which is 
exogenous to said plant or plant cell. 

55 30. A plant or cell of a plant which comprises a nucleic acid molecule according to any one of claims 1 -4, wherein said 
nucleic acid molecule is heterologous to said plant or said cell of a plant. 

31. A plant or cell of a plant which has been transformed with a nucleic acid molecule according to any one of claims 1 -4. 

40 32. A plant of cell of a plant which comprises a vector construct according to any one of claims 10-16. 

33. A plant of cell of a plant which has been transformed with a vector construct according to any one of claims 10-16. 

34. A plant which has been regenerated from a plant cell according to any one of claims 29-33. 

45 
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